People Counting in Videos by Fusing Temporal Cues from Spatial Context-Aware Convolutional Neural Networks

We present an efficient method for people counting in video sequences from fixed cameras by utilising the responses of spatially context-aware convolutional neural networks (CNN) in the temporal domain. For stationary cameras, the background information r

PDF / 1,389,050 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
71 Downloads / 212 Views

DOWNLOAD

REPORT

School of Computer Science and Mathematics, Kingston University, Kingston, UK [email protected], [email protected] 2 Department of Computer Science, Universidad Carlos III de Madrid, Getafe, Spain [email protected] 3 Departmento de Informática, Universidad de Santiago de Chile, Santiago, Chile [email protected] 4 Faculty of Engineering and Applied Sciences, Universidad de los Andes, Santiago, Chile [email protected]

Abstract. We present an efﬁcient method for people counting in video sequences from ﬁxed cameras by utilising the responses of spatially context-aware convolutional neural networks (CNN) in the temporal domain. For stationary cameras, the background information remains fairly static, while foreground characteristics, such as size and orientation may depend on their image location, thus the use of whole frames for training a CNN improves the differentiation between background and foreground pixels. Foreground density representing the presence of people in the environment can then be associated with people counts. Moreover the fusion, of the responses of count estimations, in the temporal domain, can further enhance the accuracy of the ﬁnal count. Our methodology was tested using the publicly available Mall dataset and achieved a mean deviation error of 0.091. Keywords: People counting Convolutional neural networks Video analysis

1 Introduction Counting people can provide useful information for monitoring purposes in public areas, assist urban planners in designing more efﬁcient environments, provide cues for situations that might endanger the safety of civilians, and also be used by shopping mall and retail store managers for evaluating their business practices. In principle, such knowledge can be obtained by analysing image and video footage from location speciﬁc cameras with the goal to measure the number of people in them. For this reason in this work we present an efﬁcient method for counting people in images and video © Springer International Publishing Switzerland 2016 G. Hua and H. Jégou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 655–667, 2016. DOI: 10.1007/978-3-319-48881-3_46

656

P. Sourtzinos et al.

sequences, from ﬁxed cameras which incorporates the fusion of context aware cues from CNN in the temporal domain. People counting is a very challenging problem, and although commercial solutions exist, these focus mainly in top-view cameras, where occlusions between people are minimal. An effective approach is to detect the heads of the pedestrians present in an image, since they are less prone to disappear in the image through occlusions, and then sum the head detections to measure the total count. Such an approach seems consistent to how humans would approach the problem, as implied by expressions such as ‘headcount’. Furthermore since our interest is in measuring the count of people using stationary cameras, where background is assumed fairly static, a local context-aware detector that is spatially tuned to distinguish foreground objects (e.g. hea

Data Loading...

People Counting in Videos by Fusing Temporal Cues from Spatial Context-Aware Convolutional Neural Networks

Recommend Documents

Deep Convolutional Neural Networks for Human Embryonic Cell Counting

Exploring Temporal Differences in 3D Convolutional Neural Networks

Spatial Graph Convolutional Networks

Cell Counting by Regression Using Convolutional Neural Network

Convolutional Neural Networks

Coronary artery segmentation in angiographic videos utilizing spatial-temporal information

Learning Temporal Transformations from Time-Lapse Videos

Time Series Encodings with Temporal Convolutional Networks

Detecting New Events from Microblogs Using Convolutional Neural Networks

Two stages double attention convolutional neural network for crowd counting

Does Removing Pooling Layers from Convolutional Neural Networks Improve Results?

Evaluating Visual Cues from Fabric Stretch Videos: A Study