Spatial division networks for weakly supervised detection

  • PDF / 2,064,248 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 58 Downloads / 196 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789().,-volV)

ORIGINAL ARTICLE

Spatial division networks for weakly supervised detection Yongsheng Liu1 • Wenyu Chen1 • Hong Qu1



S. M. Hasan Mahmud1 • Kebin Miao2

Received: 17 January 2020 / Accepted: 27 July 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract With only global image-level annotations, weakly supervised learning of deep convolutional neural networks has shown enough capacity in classification and localization but lack of ability to present the detection explicitly. In this work, we propose a novel spatial division network, which is applied to detect bounding boxes only with weak supervision. The essence of our model is two innovative differentiable modules, determination network and parameterized division, which perform the spatial division in feature maps of classification networks. After training, the learned parameters of the spatial division would correspond to a set of predicted bounding box coordinates. To demonstrate the effectiveness of our model for multi-label classification and weakly supervised detection, we conduct extensive experiments on the multi-MNIST dataset. Experimental results show our spatial division networks can (1) help improve the accuracy of multi-label classification, (2) implement in an end-to-end way only with the image-level annotations, and (3) output accurate bounding box coordinate, thereby achieving multi-digits detection. Keywords Deep learning  Learning systems  Convolutional neural networks  Predictive models

1 Introduction Visual detection with deep convolutional neural networks (DCNNs) has made significant progress in the last decade [19, 32, 33]. These successes are not only due to the efficacious spatial feature extraction capability of DCNNs but the increasing number of large annotated image datasets. Adequate annotation (ground truth bounding boxes) for training is necessary for fully supervised methods to get state-of-art detection results. However, annotating these full supervision labels [42, 43] is labor intensive and timeconsuming, motivating us to explore the weakly supervised detection (WSD) method with DCNNs. Compared to the fully supervised method, weakly supervised detection only acquires images with image-level annotations indicating whether an object of a specified category is present in an image or not [22]. Besides, WSD is like the visual system

& Hong Qu [email protected] 1

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

2

China Coal Research Institute, Beijing, China

of humans, which first selecting locations of related regions in the ‘‘detection’’ stage and then determining the target in the ‘‘identification’’ stage [20]. Although this learning framework serves to be more economical and interpret, the outcome tends to be somewhat backward compared to fully supervised learning. The fundamental challenge of weakly supervised detection is that the predict bounding boxes have no corresponding ground