Two stages double attention convolutional neural network for crowd counting

  • PDF / 1,325,367 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 43 Downloads / 257 Views

DOWNLOAD

REPORT


Two stages double attention convolutional neural network for crowd counting Zhao Zou 1 & Chaofeng Li 1

2

& Yuhui Zheng & Shoukun Xu

3

Received: 12 April 2020 / Revised: 9 July 2020 / Accepted: 4 August 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Crowd counting has captured wide attention in computer vision, which aims to accurately count the number of people in still images or video scenes. However, it’s still a challenging task due to the scale variation and cluttered background in crowd scenes. In this paper, we propose a 2-stage Double Attention convolutional neural network for crowd counting, and call it 2-DA-CNN, which could deal with scale variation and cluttered background in crowd counting. The proposed 2-DA-CNN includes three parts. The first part is the front-end module which consists of a set of convolution operations, whose function is to extract abundant feature of crowd. The second part is the first double attention module, which contains trunk branch and mask branch. The former is mainly composed by multi-column CNN module, which is to deal with scale variation in crowd scenes. The latter can generate two masks, which aims to assign interesting regions reasonably in cluttered situation. The third part is the second double attention module, similar to the first double attention module, which can enhance the performance of multicolumn CNN module further. In addition, we propose progressive training method to improve the drawback of using geometry-adaptive kernels to generate ground truth. The experimental results on three mainstream datasets (ShanghaiTech part B, ShanghaiTech part A and UCF_CC_50) suggest that the proposed 2-DA-CNN is competitive with the state-of-the-art methods. Keywords Crowd counting . Convolutional neural network . Double attention . Progressive training

* Chaofeng Li [email protected]

1

Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China

2

College of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China

3

School of Information Science and Engineering, Changzhou University, Changzhou 213164, China

Multimedia Tools and Applications

1 Introduction Crowd counting question aims to estimate the number of people in still images or dynamic videos. It can be used to analyze abnormality, alleviate serious occlusions and reduce some security issues in dense crowd scenes. With the development of artificial intelligence, many intelligent researches [1, 18] have a great influence on our daily life. Crowd counting has also been widely used in crowd monitoring [4], scene understanding [22, 34] and safety management [3]. However, crowd counting remains to be a challenge task due to scale variation and cluttered background in crowd scenes. The methods of crowd counting can be classified into three categories: the detection-based methods [7, 8, 14, 20], the regression-based methods [5, 19, 29] and the density map estimation-based methods [10, 35,