Hierarchical saliency mapping for weakly supervised object localization based on class activation mapping

  • PDF / 2,184,470 Bytes
  • 16 Pages / 439.642 x 666.49 pts Page_size
  • 71 Downloads / 222 Views

DOWNLOAD

REPORT


Hierarchical saliency mapping for weakly supervised object localization based on class activation mapping Zhuo Cheng1 · Hongjian Li1 · Xiangyan Zeng1 · Meiqi Wang1 · Xiaolin Duan1 Received: 18 September 2019 / Revised: 8 July 2020 / Accepted: 6 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Weakly supervised object localization is a basic research in the field of computer vision. In this paper, a hierarchical saliency mapping network for object localization is proposed and designed to avoid missing detailed information of potential object. Based on the classical convolution network, we remove the fully connected part and add multiple information extraction branches. The network extracts information from convolution layers of different scales to generate Hierarchical Saliency Map. Hierarchical Saliency Maps that include Hierarchical-Class Activation Map and Hierarchical-Spatial Pyramid Saliency Map fuse deep-level features and low-level features to locate object. The datasets used for testing are Caltech-UCSD Birds 200, Caltech101 and ImageNet. Compared with Class Activation Map and Spatial Pyramid Saliency Map, the localization accuracy has been improved. This method can be used for fine-grained classification, object tracking and other fields. Keywords Object localization · Weak supervision · Saliency map · CNNs

1 Introduction Object localization is a challenging task in the field of computer vision. It is different from object detection. In an image, object detection is to distinguish all the objects that we notice from the background, and then identify their categories [38]. Object localization is to distinguish an object that we notice from the background in an image [38]. There are two kinds of object localization method: weakly supervised object localization and strongly supervised object localization. The weakly supervised object localization is different from the strongly supervised object localization. It only needs image-level annotation [7, 29]. It doesn’t require the position information and size information of the object, such as bounding box etc. The weakly supervised object localization does not need that information, which reduces the human workload and the computation. The weakly supervised method has a wider applicability than the  Hongjian Li

[email protected] 1

Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, People’s Republic of China

Multimedia Tools and Applications

strongly supervised method, because there are just a few data sets with bounding box and most of data sets only have image-level label. Currently, Convolutional neural networks (CNNs) [8, 27, 41], intelligent algorithms [2, 4, 6, 40, 48] and feature fusion algorithms [21, 22] have been widely used. The studies [15, 17, 32, 42] have been shown that the active area of the feature map of the CNNs corresponds to the position of the object in the original picture, which indicates that the CNNs is capable of positioning. H