Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks

In this paper, we tackle the problem of RGB-D semantic segmentation of indoor images. We take advantage of deconvolutional networks which can predict pixel-wise class labels, and develop a new structure for deconvolution of multiple modalities. We propose

PDF / 1,700,313 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
60 Downloads / 189 Views

DOWNLOAD

REPORT

Nanyang Technological University, Singapore, Singapore [email protected], [email protected], [email protected] 2 University of Technology Sydney (UTS), Ultimo, Australia [email protected] 3 NVIDIA Corporation, Santa Clara, USA [email protected]

Abstract. In this paper, we tackle the problem of RGB-D semantic segmentation of indoor images. We take advantage of deconvolutional networks which can predict pixel-wise class labels, and develop a new structure for deconvolution of multiple modalities. We propose a novel feature transformation network to bridge the convolutional networks and deconvolutional networks. In the feature transformation network, we correlate the two modalities by discovering common features between them, as well as characterize each modality by discovering modality speciﬁc features. With the common features, we not only closely correlate the two modalities, but also allow them to borrow features from each other to enhance the representation of shared information. With speciﬁc features, we capture the visual patterns that are only visible in one modality. The proposed network achieves competitive segmentation accuracy on NYU depth dataset V1 and V2. Keywords: Semantic segmentation · Deep learning · Common feature · Speciﬁc feature

1

Introduction

Semantic segmentation of scenes is a fundamental task in image understanding. It assigns a class label to each pixel of an image. Previously, most research works focus on outdoor scenarios [1–6]. Recently, the semantic segmentation of indoor images attracts increasing attention [3,7–15]. It is challenging due to many reasons, including randomness of object distribution, poor illumination, occlusion and so on. Figure 1 shows an example of indoor scene segmentation. Thanks to the Kinect and other low-cost RGB-D cameras, we can obtain not only the color images (Fig. 1(a)), but also the depth maps of indoor scenes (Fig. 1(b)). The additional depth information is independent of illumination, c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part V, LNCS 9909, pp. 664–679, 2016. DOI: 10.1007/978-3-319-46454-1 40

Learning Common and Speciﬁc Features for RGB-D Semantic Segmentation

665

Fig. 1. Example images from the NYU Depth Dataset V2 [7]. (a) shows an RGB image captured in a homeoﬃce. (b) and (c) are the corresponding depth map and groundtruth. (d-f) are the visualized RGB speciﬁc feature, depth speciﬁc feature, and common feature (The method to obtain these features will be discussed in Sect. 5.2.). RGB speciﬁc features encode the texture-rich visual patterns, such as the objects on the desk (the red circle in (d)). The depth speciﬁc features encode the visual patterns which are more obvious in the depth map, such as the chair (the green circle in (e)). Common features encode the visual patterns that are visible in both modalities, such as the edges (the yellow circles in (f)) (Color ﬁgure online)

which can signiﬁcantly alleviate the challenges in semantic segmentation. With the availability of RGB-D indoor sc

Data Loading...

Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks

Recommend Documents

Salient Deconvolutional Networks

Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation

Few-Shot Semantic Segmentation with Democratic Attention Networks

Semantic segmentation of brain tumor with nested residual attention networks

Indirect Local Attacks for Context-Aware Semantic Segmentation Networks

Semantic Segmentation of Open Pit Mining Area Based on Remote Sensing Shallow Features and Deep Learning

Semantic Segmentation with Peripheral Vision

Attention-Based Network for Semantic Image Segmentation via Adversarial Learning

Learning to Predict Context-Adaptive Convolution for Semantic Segmentation

Learning Semantic Deformation Flows with 3D Convolutional Networks

Learning Hierarchical Semantic Correspondence and Gland Instance Segmentation

Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions