Towards Perspective-Free Object Counting with Deep Learning

In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal

  • PDF / 6,018,994 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 6 Downloads / 196 Views

DOWNLOAD

REPORT


stract. In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance.

1

Introduction

Take an image of a crowded scene, or of a traffic jam. We address here the hard problem of accurately counting the objects instances in these scenarios. To develop this type of ideas makes possible to build applications that span from solutions to improve security in stadiums, to systems that precisely monitor how the traffic congestions evolve. Note that the covered applications define the typical scenarios where individual object detectors (e.g. [1,2]) do not work reliably. The reasons are: the extreme overlap of objects, the size of the instances, scene perspective, etc. Thus, approaches modeling the counting problem as one of object density estimation have been systematically defining the state-of-the-art [3–7]. For this reason, we propose here two deep learning models for object density map estimation. As illustrated in Fig. 1, we tackle the counting problem proposing deep learning architectures able to learn the regression function that projects the image Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46478-7 38) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 615–629, 2016. DOI: 10.1007/978-3-319-46478-7 38

616

D. O˜ noro-Rubio and R.J. L´ opez-Sastre

Fig. 1. We define the object counting task like a regression problem where a deep learning model has to learn how to map image patches to object densities.

appearance into an object density map. This allows the derivation of an estimated object density map for unseen images. The main contributions of this work are as follows. First, in Sect. 3.2, we propose a novel deep network architecture, named Counting CNN (CCNN), which is an efficient fully-convolutional neural network able to perform an accurate regression of object density maps from image patches. Second, we show that object densities can be estimated without the need of any pe