Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

  • PDF / 1,380,290 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 10 Downloads / 189 Views

DOWNLOAD

REPORT


Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders K. Deepak1 · G. Srivathsan1 · S. Roshan1 · S. Chandrakala1 Received: 21 January 2020 / Revised: 7 August 2020 / Accepted: 11 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Visual perception is a transformative technology that can recognize patterns from environments through visual inputs. Automatic surveillance of human activities has gained significant importance in both public and private spaces. It is often difficult to understand the complex dynamics of events in real-time scenarios due to camera movements, cluttered backgrounds, and occlusion. Existing anomaly detection systems are not efficient because of high intra-class variations and inter-class similarities existing among activities. Hence, there is a demand to explore different kinds of information extracted from surveillance videos to improve overall performance. This can be achieved by learning features from multiple forms (views) of the given raw input data. We propose two novel methods based on the multi-view representation learning framework. The first approach is a hybrid multi-view representation learning that combines deep features extracted from 3D spatiotemporal autoencoder (3D-STAE) and robust handcrafted features based on spatiotemporal autocorrelation of gradients. The second approach is a deep multi-view representation learning that combines deep features extracted from two-stream STAEs to detect anomalies. Results on three standard benchmark datasets, namely Avenue, Live Videos, and BEHAVE, show that the proposed multi-view representations modeled with one-class SVM perform significantly better than most of the recent state-of-the-art methods.

B

S. Chandrakala [email protected]; [email protected] K. Deepak [email protected]; [email protected] G. Srivathsan [email protected] S. Roshan [email protected]

1

Intelligent Systems Group, School of Computing, SASTRA University, Thanjavur 613401, India

Circuits, Systems, and Signal Processing

Keywords Video anomaly detection · 3D spatiotemporal autoencoder · Multi-view representation learning · Spatiotemporal autocorrelation of gradients (STACOG) · One-class SVM

1 Introduction Visual perception helps in understanding patterns from environments through visual inputs. Video information processing is an essential step toward building applications for tasks such as video annotations [9,20], facial expression recognition [35], and visual surveillance [2]. In recent times, automated video surveillance has gained significant interest in the field of computer vision [34,42]. Extensive research in automated surveillance can drastically reduce the laborious responsibilities of manual supervision and thereby decrease the response time. One such crucial task in video surveillance is detecting anomalous events [53]. The events which deviate from normal contextual behavior can be termed as anomalous events. Detecting such events can be challengin