Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using i

PDF / 4,541,361 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
8 Downloads / 292 Views

DOWNLOAD

REPORT

The Australian National University (ANU), Canberra, Australia [email protected] 2 CSIRO, Canberra, Australia {fatemehsadat.saleh,mohammadsadegh.aliakbarian,lars.petersson, jose.alvarezlopez}@data61.csiro.au 3 CVLab, EPFL, Lausanne, Switzerland [email protected]

Abstract. Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a signiﬁcant impact in semantic segmentation. Recently, CNN-based methods have proposed to ﬁne-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require training pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract markedly more accurate masks from the pre-trained network itself, forgoing external objectness modules. This is accomplished using the activations of the higher-level convolutional layers, smoothed by a dense CRF. We demonstrate that our method, based on these masks and a weakly-supervised loss, outperforms the state-of-the-art tag-based weakly-supervised semantic segmentation techniques. Furthermore, we introduce a new form of inexpensive weak supervision yielding an additional accuracy boost. Keywords: Semantic segmentation · Weak annotation neural networks · Weakly-supervised segmentation

1

· Convolutional

Introduction

Semantic scene segmentation, i.e., assigning a class label to every pixel in an input image, has received growing attention in the computer vision community, with accuracy greatly increasing over the years [1–6]. In particular, Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46484-8 25) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 413–432, 2016. DOI: 10.1007/978-3-319-46484-8 25

414

F. Saleh et al.

fully-supervised approaches based on Convolutional Neural Networks (CNNs) have recently achieved impressive results [1–4,7]. Unfortunately, these methods require large amounts of training images with pixel-level annotations, which are expensive and time-consuming to obtain. Weakly-supervised techniques have therefore emerged as a solution to address this limitation [8–15]. These techniques rely on a weaker form of training annotations, such as, from weaker to stronger levels of supervision, image tags [12,14,16,17], information about object sizes [17], labeled points or squiggles [12] and labeled bounding boxes [13,18]. In the current Deep Learning era, existing weakly-supervised methods typically start from a network pre-trained on an object recognition dataset (e.g., ImageNet [19]) and ﬁne-tune it using segmentation losses deﬁned according to the weak annotations at hand [12–14,16,17]. In this paper, we are particula

Data Loading...

Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

Recommend Documents

Clockwork Convnets for Video Semantic Segmentation

Deep FusionNet for Point Cloud Semantic Segmentation

Tensor Low-Rank Reconstruction for Semantic Segmentation

Boundary Enhanced Network for Improved Semantic Segmentation

Semantic Segmentation Datasets for Resource Constrained Training

Adaptive Feature Enhancement Network for Semantic Segmentation

EfficientFCN: Holistically-Guided Decoding for Semantic Segmentation

DrsNet : Dual-resolution semantic segmentation with rare class-oriented superpixel prior

Semantic Segmentation with Peripheral Vision

Semantic Segmentation Using ENet Architecture

Evolution Model Based on Prior Information for Narrow Joint Segmentation

Attention-Based Network for Semantic Image Segmentation via Adversarial Learning