Adaptive weighted crowd receptive field network for crowd counting

  • PDF / 2,034,255 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 4 Downloads / 243 Views

DOWNLOAD

REPORT


SHORT PAPER

Adaptive weighted crowd receptive field network for crowd counting Sifan Peng1   · Luyang Wang1 · Baoqun Yin1 · Yun Li1 · Yinfeng Xia1 · Xiaoliang Hao1 Received: 7 October 2019 / Accepted: 27 October 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Crowd counting plays an important role in crowd analysis and monitoring. To this end, we propose a novel method called Adaptive Weighted Crowd Receptive Field Network (AWRFN) for crowd counting to estimate the number of people and the spatial distribution of input crowd images. The proposed AWRFN is composed of four modules: backbone, crowd receptive field block (CRFB), recurrent block (RB), and channel attention block (CAB). Backbone utilizes the first ten layers of VGG16 to extract base features of input images. CRFB is a multi-branch architecture simulating a real human visual system for further obtaining refined and discriminative crowd features. RB generates strong semantic and global information by recurrently stacking convolutional layers with the same parameters. CAB outputs appropriate weights to supervise each channel of the feature maps output from CRFB, which uses the outputs of RB as guidance. Different from previous works using Euclidean Loss, we employ L1_Smooth Loss to train our network in an end-to-end fashion. To demonstrate the effectiveness of our proposed method, we implement AWRFN on two representative datasets including the ShanghaiTech dataset and the UCF_CC_50 dataset. The experimental results prove that our method is both effective and robust compared with the state-of-the-art approaches. Keywords  Crowd counting · Convolutional neural network · Crowd receptive field block · Recurrent block · Channel attention block

1 Introduction With the increasing population, more and more researchers focus on security issues in public places. As far as we know, the key factor of security issues is to monitor the number and distribution of crowds in real time. Although previous works [1–3] make significant contributions to promote the development of crowd counting, there are still various challenges resulted from background interference, occlusion, large crowd scale or perspective changes, and nonuniform population distribution, which leave a huge room for improvements. Scale variations are the main factors in the above issues influencing counting performance. Many methods [4–7] come forward to deal with scale variation problems and achieve well performance. Most of these methods utilize multi-column networks for simulating the receptive field * Sifan Peng [email protected] 1



Department of Automation, University of Science and Technology of China, Hefei, China

changes of human eyes to cope with the problems caused by various crowd scales. As we know that the human eyes adjust the distribution of receptive fields in the human retinotopic map, which aims at dealing with object scale variation issues. Existing multi-column networks simulate this visual mechanism by designing several columns with filters of differ