Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation

We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in a

  • PDF / 3,848,455 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 46 Downloads / 149 Views

DOWNLOAD

REPORT


Abstract. We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in an image, and to constrain the segmentations to coincide with object boundaries. We show experimentally that training a deep convolutional neural network using the proposed loss function leads to substantially better segmentations than previous state-of-the-art methods on the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the working mechanism of our method by a detailed experimental study that illustrates how the segmentation quality is affected by each term of the proposed loss function as well as their combinations. Keywords: Weakly-supervised image segmentation

1

· Deep learning

Introduction

Computer vision research has recently made tremendous progress. Many challenging vision tasks can now be solved with high accuracy, assuming that sufficiently much annotated data is available for training. Unfortunately, collecting large labeled datasets is time consuming and typically requires substantial financial investments. Therefore, the creation of training data has become a bottleneck for the further development of computer vision methods. Unlabeled visual data, however, can be collected in large amounts in a relatively fast and cheap manner. Therefore, a promising direction in the computer vision research is to develop methods that can learn from unlabeled or partially labeled data. In this paper we focus on the task of semantic image segmentation. Image segmentation is a prominent example of an important vision task, for which creating annotations is especially costly: as reported in [4,29], manually producing segmentation masks requires several worker-minutes per image. Therefore, a large body of previous research studies how to train segmentation models from weaker forms of annotation. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 42) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 695–711, 2016. DOI: 10.1007/978-3-319-46493-0 42

696

A. Kolesnikov and C.H. Lampert

A particularly appealing setting is to learn image segmentation models using training sets with only per-image labels, as this form of weak supervision can be collected very efficiently. However, there is currently still a large performance gap between models trained from per-image labels and models trained from full segmentations masks. In this paper we demonstrate that this gap can be substantially reduced compared to the previous state-of-the-art techniques. We propose a new composite loss function for training convolutional neural networks for the task of weakly-supervised image segmentation. Our approach relies on the following three insights: – Image classification neural networks,