Multi-stream Deep Networks for Person to Person Violence Detection in Videos

Violence detection in videos has numerous applications, ranging from parental control and children protection to multimedia filtering and retrieval. A number of approaches have been proposed to detect vital clues for violent actions, among which most meth

PDF / 2,545,584 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
40 Downloads / 325 Views

DOWNLOAD

REPORT

Abstract. Violence detection in videos has numerous applications, ranging from parental control and children protection to multimedia ﬁltering and retrieval. A number of approaches have been proposed to detect vital clues for violent actions, among which most methods prefer employing trajectory based action recognition techniques. However, these methods can only model general characteristics of human actions, thus cannot well capture speciﬁc high order information of violent actions. Therefore, they are not suitable for detecting violence, which is typically intense and correlated with speciﬁc scenes. In this paper, we propose a novel framework, i.e., multi-stream deep convolutional neural networks, for person to person violence detection in videos. In addition to conventional spatial and temporal streams, we develop an acceleration stream to capture the important intense information usually involved in violent actions. Moreover, a simple and eﬀective score-level fusion strategy is proposed to integrate multi-stream information. We demonstrate the eﬀectiveness of our method on the typical violence dataset and extensive experimental results show its superiority over state-of-the-art methods. Keywords: Violence detection · Acceleration feature neural networks · Long short-term memory

1

·

Convolutional

Introduction

With the rapid development of digital media, massive collections of video materials have become ubiquitous online. Detecting diﬀerent types of human actions has a wide range of applications. Among various applications, for the reason of protecting children against oﬀensive video contents and providing people the ability of content-based video ﬁltering or retrieval, detecting violent actions in videos has recently received considerable attentions. Violence detection poses big challenges to the computer vision community. On one hand, because of the subjective nature, one may have an ambiguous concept of violence in deﬁnition. Here, we adopt the common deﬁnition from VSD [1], i.e., physical violence or accident resulting in human injury or pain. On the other hand, violence detection in surveillance videos always turns into the crowd scene analysis problem. c Springer Nature Singapore Pte Ltd. 2016 T. Tan et al. (Eds.): CCPR 2016, Part I, CCIS 662, pp. 517–531, 2016. DOI: 10.1007/978-981-10-3002-4 43

518

Z. Dong et al.

In this paper, we are speciﬁcally interested in content based person to person violence detection at a relatively short distance in videos. To address the above problem, previous researchers prefer employing trajectory-based action recognition techniques [2,3,11]. Conventional approaches often follow the standard bag-of-words pipeline for representing general human actions. Speciﬁcally, they ﬁrst extract several types of features of entire videos, then quantize features into histograms using k-means clustering, VLAD [29] or Fisher Vector [19]. The key step of these methods is extracting proper features to model human actions. For instance, improved dense trajectory [26] extracts Motion

Data Loading...

Multi-stream Deep Networks for Person to Person Violence Detection in Videos

Recommend Documents

Person Name Segmentation with Deep Neural Networks

Online Multi-modal Person Search in Videos

Multi-Cue and Temporal Attention for Person Recognition in Videos

Multi-person Detection

Person Re-identification in Videos by Analyzing Spatio-temporal Tubes

Face Tracker-Assisted Multi-Person Face Recognition in Surveillance Videos

Multi-person Pose Estimation with Local Joint-to-Person Associations

A divide-and-unite deep network for person re-identification

MobileNet Mask: A Multi-phase Face Mask Detection Model to Prevent Person-To-Person Transmission of SARS-CoV-2

Person Tradeoff

A deep learning approach for person identification using ear biometrics

Global Deep Feature Representation for Person Re-Identification