Pedestrian Behavior Understanding and Prediction with Deep Neural Networks

In this paper, a deep neural network (Behavior-CNN) is proposed to model pedestrian behaviors in crowded scenes, which has many applications in surveillance. A pedestrian behavior encoding scheme is designed to provide a general representation of walking

  • PDF / 4,160,715 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 9 Downloads / 235 Views

DOWNLOAD

REPORT


Department of Electronic Engineering, Chinese University of Hong Kong, Hong Kong, China {syi,hsli,xgwang}@ee.cuhk.edu.hk 2 Sensetime Group Limited, Hong Kong, China [email protected]

Abstract. In this paper, a deep neural network (Behavior-CNN) is proposed to model pedestrian behaviors in crowded scenes, which has many applications in surveillance. A pedestrian behavior encoding scheme is designed to provide a general representation of walking paths, which can be used as the input and output of CNN. The proposed Behavior-CNN is trained with real-scene crowd data and then thoroughly investigated from multiple aspects, including the location map and location awareness property, semantic meanings of learned filters, and the influence of receptive fields on behavior modeling. Multiple applications, including walking path prediction, destination prediction, and tracking, demonstrate the effectiveness of Behavior-CNN on pedestrian behavior modeling.

1

Introduction

Pedestrian behavior modeling is gaining increasing attention and can be used for various applications including behavior prediction [1–4], pedestrian detection and tracking [5–7], crowd motion analysis [8–11], and abnormal detection [12–14]. Modeling pedestrian behaviors is challenging. Pedestrian decision making is complex and can be influenced by various factors. The decision making process of individuals [15], the interactions among moving and stationary pedestrians [4,16], and historical motion statistics of a scene provide information for predicting future behaviors of pedestrians. While existing works focused some of these aspects with simplified rules or energy functions [15,17], our proposed model takes all these factors into account through a complex deep convolution neural network (Behavior-CNN) and makes more reliable predictions. When using deep neural networks to model pedestrian behaviors, the main difficulty is how to make good use of pedestrian walking information as the input of networks. A straightforward way was to use dense optical flow maps to describe motions of a whole frame. However, it introduces ambiguities when merging and splitting events happen frequently in crowded scenes. As shown in Fig. 1(c), two separate pedestrians A and B at time t − 1 move to occlude each other at location C at time t. The two flow vectors (A → C) and (B → C) describe the associations between t−1 and t. If the two pedestrians move to locations D and E c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 263–279, 2016. DOI: 10.1007/978-3-319-46448-0 16

264

S. Yi et al. A

(a)

(b)

(c)

D C

B time t-1

E time t

time t+1

Fig. 1. Prediction results by the proposed Behavior-CNN (a) and the Social Force Model [15] (b). The input, predicted and ground-truth walking paths are shown as blue, red, and green dots, respectively. Only some pedestrians’ prediction results are shown in the figure. (c) Illustration of association ambiguity in dense flow maps. (Color figure online)

at t+1 with flow vectors (C → D) and (C →