Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation

In this paper, we deal with a weakly supervised semantic segmentation problem where only training images with image-level labels are available. We propose a weakly supervised semantic segmentation method which is based on CNN-based class-specific saliency

  • PDF / 3,537,708 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 20 Downloads / 238 Views

DOWNLOAD

REPORT


Abstract. In this paper, we deal with a weakly supervised semantic segmentation problem where only training images with image-level labels are available. We propose a weakly supervised semantic segmentation method which is based on CNN-based class-specific saliency maps and fully-connected CRF. To obtain distinct class-specific saliency maps which can be used as unary potentials of CRF, we propose a novel method to estimate class saliency maps which improves the method proposed by Simonyan et al. (2014) significantly by the following improvements: (1) using CNN derivatives with respect to feature maps of the intermediate convolutional layers with up-sampling instead of an input image; (2) subtracting the saliency maps of the other classes from the saliency maps of the target class to differentiate target objects from other objects; (3) aggregating multiple-scale class saliency maps to compensate lower resolution of the feature maps. After obtaining distinct class saliency maps, we apply fully-connected CRF by using the class maps as unary potentials. By the experiments, we show that the proposed method has outperformed state-of-the-art results with the PASCAL VOC 2012 dataset under the weakly-supervised setting. Keywords: Semantic segmentation · Weakly supervised segmentation · Fully convolutional neural network · Fully connected CRF

1

Introduction

Due to the recent advent of deep learning methods, convolutional neural network (CNN) based methods have outperformed most of the previous state-of-the-art in various kinds of image recognition tasks. In the task of semantic segmentation, CNN achieved about 50 % improvement [3,4]. Semantic image segmentation is a task to add object class labels to each of all the pixels in a given image, which is more challenging task than object classification and object detection. Semantic segmentation is expected to contribute detailed analysis of images in Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 14) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 218–234, 2016. DOI: 10.1007/978-3-319-46493-0 14

Distinct Class-Specific Saliency Maps

219

(A) sample Simonyan et al. [1] GrabCut our saliency maps (H) CRF image (B) motorbike (C) person (D) motorbike (E) person (F) motorbike (G) person result

Fig. 1. (From the left) (A) sample image, (B), (C) its class saliency maps with respect to “motorbike” and “person” by [1], (D), (E) estimated regions of them by GrabCut, (F), (G) class saliency maps by the proposed method, and (H) estimated regions by Dense CRF.

various practical tasks such as food calorie estimation [5,6]. However, most of the CNN based semantic segmentation methods assume that pixel-wise annotation is available, which is costly to obtain in general. On the other hand, collecting images with image-level annotation is easier than those with pixel-level annotation, since many images attached wit