Incorporating Side Information by Adaptive Convolution
- PDF / 4,260,822 Bytes
- 22 Pages / 595.276 x 790.866 pts Page_size
- 33 Downloads / 212 Views
Incorporating Side Information by Adaptive Convolution Di Kang1,2
· Debarun Dhar1 · Antoni B. Chan1
Received: 8 January 2019 / Accepted: 30 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in deep learning based counting systems. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information. In particular, we model the filter weights as a low-dimensional manifold within the high-dimensional space of filter weights. The filter weights are generated using a learned “filter manifold” sub-network, whose input is the side information. With the help of side information and adaptive weights, the ACNN can disentangle the variations related to the side information, and extract discriminative features related to the current context (e.g. camera perspective, noise level, blur kernel parameters). We demonstrate the effectiveness of ACNN incorporating side information on 3 tasks: crowd counting, corrupted digit recognition, and image deblurring. Our experiments show that ACNN improves the performance compared to a plain CNN with a similar number of parameters and achieves similar or better than state-of-the-art performance on crowd counting task. Since existing crowd counting datasets do not contain ground-truth side information, we collect a new dataset with the ground-truth camera angle and height as the side information. We also perform ablation experiments, mainly for crowd counting, to study the helpfulness of the side information, and the effect of the placement of the adaptive convolutional layers in order to get insight about ACNNs. Keywords Convolutional neural network (CNN) · Deep learning · Crowd counting
1 Introduction Computer vision tasks often have side information available that is helpful to solve the task. Here we define “side information” as auxiliary metadata that is associated with the main input, and that affects the appearance/properties of the main input.
Communicated by S. Soatto.
B
Di Kang [email protected] Debarun Dhar [email protected] Antoni B. Chan [email protected]
1
Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
2
Tencent AI Lab, Shenzhen, China
For example, the camera angle affects the appearance of a person in an image (see Fig. 1 top). Even within the same scene, a person’s appearance changes as they move along the ground-plane, due to changes in the relative angles to the camera sensor. Most deep learning methods ignore the side information, since
Data Loading...