Scale channel attention network for image segmentation

  • PDF / 7,502,963 Bytes
  • 17 Pages / 439.642 x 666.49 pts Page_size
  • 65 Downloads / 305 Views

DOWNLOAD

REPORT


Scale channel attention network for image segmentation Jianjun Chen1,2 · Youliang Tian3 · Wei Ma1 · Zhengdong Mao1 · Yue Hu1 Received: 7 May 2019 / Revised: 18 February 2020 / Accepted: 7 April 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The object scale variation results in a negative effect on image segmentation performance. Spatial pyramid pooling module or the attention mechanism are two widely used components in deep neural networks to handle this problem. Applying the single component commonly achieves limited benefit. To push the limit, in this paper, we propose a scale channel attention network (SCA-Net), which enhances the fusion feature of multi-scale by using channel attention components. After the multiple-scale pooling step, the multiscale spatial information distributes in different feature channels. Meanwhile, the channel attention block is employed to guide SCA-Net focus on the object-relevant scale channels. We further explore the channel attention block and find a simple yet effective structure to combine global average pooling and global maximum pooling, resulting in a robust global information encoder. The SCA-Net does not contain any time-consuming post-processing, which is an extra step after the neural network for the segmentation result optimization. The assessment results on PASCAL VOC 2012 and Cityscapes benchmarks achieve the test set performance of 75.5% and 77.0%. Keywords Image segmentation · Convolutional neural network · Attention mechanism · Spatial pyramid pooling · Multi-source and heterogeneous data

1 Introduction Image segmentation is an essential topic in image content understanding and has a broad prospect on image editing, auto driving and multi-source and heterogeneous image analytics. Recently, the state-of-the-art results of image segmentation are mainly achieved by convolutional neural networks (CNN) [24], [27], [21] and [4]. Although the researchers have  Youliang Tian

[email protected] 1

National Engineering Laboratory for Information Security Technologies, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China

2

School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

3

Guizhou Provincial Key Laboratory of Public Big Data, College of Computer Science and Technology, GuiZhou University, Guiyang, Guizhou, 550025, China

Multimedia Tools and Applications

made progress in the task, how to effectively fuse global and local features and improve the scale-robustness of CNNs is still hard. Especially for the multi-source image, the condition of object scale variance is very complex. As shown in Fig. 1, there is large-scale stuff/things like cars and road and the tiny-scale object like pedestrians. Generally, if a CNN has a larger receptive field, it will gain more global information and generates a better representation for the large-scale staff/things. For the tiny-scale object, a larger receptive field also captures more global information, however, which contain