Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs
- PDF / 1,081,216 Bytes
- 15 Pages / 439.37 x 666.142 pts Page_size
- 28 Downloads / 198 Views
Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs Wei Li 1,2,3
& Junhua Gu
4,5
& Yongfeng Dong
4,5
& Yao Dong
4,5
& Jungong Han
6
Received: 1 February 2019 / Revised: 5 April 2019 / Accepted: 10 June 2019 # Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract With the availability of low-cost depth-visual sensing devices, such as Microsoft Kinect, we are experiencing a growing interest in indoor environment understanding, at the core of which is semantic segmentation in RGB-D image. The latest research shows that the convolutional neural network (CNN) still dominates the image semantic segmentation field. However, downsampling operated during the training process of CNNs leads to unclear segmentation boundaries and poor classification accuracy. To address this problem, in this paper, we propose a novel end-to-end deep architecture, termed FuseCRFNet, which seamlessly incorporates a fully-connected Conditional Random Fields (CRFs) model into a depth-based CNN framework. The proposed segmentation method uses the properties of pixel-to-pixel relationships to increase the accuracy of image semantic segmentation. More importantly, we formulate the CRF as one of the layers in FuseCRFNet to refine the coarse segmentation in the forward propagation, in meanwhile, it passes back the errors to facilitate the training. The performance of our FuseCRFNet is evaluated by experimenting with SUN RGB-D dataset, and the results show that the proposed algorithm is superior to existing semantic segmentation algorithms with an improvement in accuracy of at least 2%, further verifying the effectiveness of the algorithm. Keywords Sematic segmentation . CNNs . RGB-D . Fully-connected conditional random field
1 Introduction In recent years, computer vision research has been thriving due to the influence of deep learning and big data [51]. The focus of research has shifted from simple object recognition to scene understanding, which plays an important role in many fields such as autonomous driving, human-computer interaction, and intelligent robots [20]. Techniques for achieving more accurate analysis and understanding of the environment have become popular, in which
* Wei Li [email protected] Extended author information available on the last page of the article
Multimedia Tools and Applications
semantic segmentation has always been the most crucial task. With the aid of deep learning, the process of image semantic segmentation is changing from manual labeling to automatic recognition. Especially after the convolutional neural network (CNN) achieved a breakthrough in image classification, computer vision has been heavily influenced by deep learning. There are now many excellent neural networks available for image segmentation specifically, such as FCN [23], SegNet [2], and DeepLab [7]. However, most of the semantic segmentation models are performed on RGB images, which are limited due to the lack of capability to handle geometric information. Since more and mor
Data Loading...