Two-branch encoding and iterative attention decoding network for semantic segmentation

  • PDF / 2,830,313 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 25 Downloads / 190 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

Two-branch encoding and iterative attention decoding network for semantic segmentation Hegui Zhu1 • Min Zhang1 • Xiangde Zhang1



Libo Zhang2

Received: 9 March 2020 / Accepted: 19 August 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Deep convolutional neural networks(DCNNs) have shown outstanding performance in semantic image segmentation. In this paper, we propose a two-branch encoding and iterative attention decoding semantic segmentation model. In encoding stage, an improved PeleeNet is used as the backbone branch to extract dense image features, and the spatial branch is used to preserve fine-grained information. In decoding stage, the iterative attention decoding is employed to optimize the segmentation results with multi-scale features. Furthermore, we propose a channel position attention module and a boundary residual attention module to learn different position and boundary features, which can enrich the target boundary position information. Finally, we use SegNet as the basic network and conduct some experiments to evaluate the effect of each component in the proposed model with accuracy and mIOU on CamVid dataset. Furthermore, we verify the segmentation performance of the proposed model with comparable experiments on CamVid, Cityscapes and PASCAL VOC 2012 dataset. In particular, the model has achieved 91.7% segmentation accuracy and 67.1% mIOU on the CamVid dataset respectively, which verify the effectiveness of our proposed model. In the future, we can combine target detection with semantic segmentation to further improve the semantic segmentation effect of small objects. We also hope to further optimize the model structure and reduce its time complexities and parameters under the guarantee of effectiveness. Keywords Semantic segmentation  Two-branch encoding  Improved PeleeNet  Iterative attention decoding  Channel position attention  Boundary residual attention

1 Introduction Semantic image segmentation is often used in scene understanding [1–3], object detection [4–6] and autonomous driving [7], which plays a significant role in computer vision. Recently, deep convolutional neural & Xiangde Zhang [email protected] Hegui Zhu [email protected] Min Zhang [email protected] Libo Zhang [email protected] 1

College of Sciences, Northeastern University, Shenyang 110819, China

2

Department of radiology, The General Hospital of Northern Theater Command PLA, Shenyang 110016, China

networks(DCNNs) have achieved significant success and extensive applications in image classification [8–11], but they have some limitations when solving dense prediction tasks. In particular, semantic image segmentation need more dense features and spatial information; however, DCNNs such as VGG and ResNet have complex structure and lack dense features. During the encoding stage, with the consecutive pooling layers and strided convolutions, the input image will lose fine-grained image structure, global conte