Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

CNN architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent

PDF / 3,411,439 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
58 Downloads / 254 Views

DOWNLOAD

REPORT

Abstract. CNN architectures have terriﬁc recognition performance but rely on spatial pooling which makes it diﬃcult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains signiﬁcant sub-pixel localization information. (2) We describe a multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively reﬁne segment boundaries reconstructed from lower-resolution maps. This approach yields state-of-the-art semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex random-ﬁeld inference or instance detection driven architectures.

Keywords: Semantic segmentation

1

· Convolutional neural networks

Introduction

Deep convolutional neural networks (CNNs) have proven highly eﬀective at semantic segmentation due to the capacity of discriminatively pre-trained feature hierarchies to robustly represent and recognize objects and materials. As a result, CNNs have signiﬁcantly outperformed previous approaches (e.g., [2,3,28]) that relied on hand-designed features and recognizers trained from scratch. A key diﬃculty in the adaption of CNN features to segmentation is that feature pooling layers, which introduce invariance to spatial deformations required for robust recognition, result in high-level representations with reduced spatial resolution. In this paper, we investigate this spatial-semantic uncertainty principle for CNN hierarchies (see Fig. 1) and introduce two techniques that yield substantially improved segmentations. First, we tackle the question of how much spatial information is represented at high levels of the feature hierarchy. A given spatial location in a convolutional feature map corresponds to a large block of input pixels (and an even larger “receptive ﬁeld”). While max pooling in a single feature channel clearly destroys spatial information in that channel, spatial ﬁltering prior to pooling introduces strong correlations across channels which could, in principle, encode c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 519–534, 2016. DOI: 10.1007/978-3-319-46487-9 32

520

G. Ghiasi and C.C. Fowlkes

Fig. 1. In this paper, we explore the trade-oﬀ between spatial and semantic accuracy within CNN feature hierarchies. Such hierarchies generally follow a spatial-semantic uncertainty principle in which high levels of the hierarchy make accurate semantic predictions but are poorly localized in space while at low levels, boundaries are precise but labels are noisy. We develop reconstruction techniques for increasing spatial accuracy at a given level and reﬁnement techniques for fusing multiple levels that limit these tradeoﬀs and produce improved semantic segmentations.

signiﬁcant “sub-p

Data Loading...

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Recommend Documents

Tensor Low-Rank Reconstruction for Semantic Segmentation

Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation

Efficient Segmentation Pyramid Network

3D-Reconstruction and Semantic Segmentation of Cystoscopic Images

SegFix: Model-Agnostic Boundary Refinement for Segmentation

Cellular/Vascular Reconstruction Using a Deep CNN for Semantic Image Preprocessing and Explicit Segmentation

Large Deformation Diffeomorphic Image Registration with Laplacian Pyramid Networks

Clockwork Convnets for Video Semantic Segmentation

Deep FusionNet for Point Cloud Semantic Segmentation

Boundary Enhanced Network for Improved Semantic Segmentation

Semantic Segmentation Datasets for Resource Constrained Training

Adaptive Feature Enhancement Network for Semantic Segmentation