Salient object detection for RGB-D images by generative adversarial network

  • PDF / 1,511,687 Bytes
  • 23 Pages / 439.642 x 666.49 pts Page_size
  • 22 Downloads / 243 Views

DOWNLOAD

REPORT


Salient object detection for RGB-D images by generative adversarial network Zhengyi Liu1

· Jiting Tang1 · Qian Xiang1 · Peng Zhao1

Received: 22 July 2019 / Revised: 17 April 2020 / Accepted: 5 June 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Salient object detection for RGB-D image aims to automatically detect the objects of human interest by color and depth information. In the paper generative adversarial network is adopted to improve its performance by adversarial learning. Generator network takes RGBD images as inputs and outputs synthetic saliency maps. It adopts double stream network to extract color and depth feature individually and then fuses them from deep to shallow progressively. Discriminator network takes RGB image and synthetic saliency maps (RGBS), RGB image and ground truth saliency map (RGBY) as inputs, and outputs their labels indicating whether input is synthetics or ground truth. It consists of three convolution blocks and three fully connected layers. In order to pursuit long-range dependency of feature, selfattention layer is inserted in both generator and discriminator network. Supervised by real labels and ground truth saliency map, discriminator network and generator network are adversarial trained to make generator network cheat discriminator network successfully and discriminator network distinguish synthetics or ground truth correctly. Experiments demonstrate adversarial learning enhances the ability of generator network, RGBS and RGBY input in discriminator network and self-attention layer play an important role in improving the performance. Meanwhile our method outperforms state-of-the-art methods. Keywords Generative adversarial network; Salient object detection; RGB-D image; Self-attention; Double stream network

 Zhengyi Liu

[email protected] Jiting Tang [email protected] Qian Xiang [email protected] Peng Zhao [email protected] 1

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China

Multimedia Tools and Applications

1 Introduction Salient object detection (SOD) mimics human intelligence and detects the most attracting objects or instance [29, 37]. It can filter out irrelevant information and reduce the complexity of visual analysis, and it is served as an important pre-processing step in many problems, for example object detection [4, 70], VR Explorations [73] and so on. SOD research includes RGB SOD [15, 17, 83], RGB-D SOD [8, 11, 20, 39, 51, 63, 82, 85] in which depth information can be utilized, light-field SOD in which light field information is involved[38, 52, 64], high-resolution image SOD[35, 75] in which high-resolution image is directly handled, co-saliency detection [36, 68, 71, 77] in which inner and inter saliency constraint need be considered simultaneously, and video saliency detection[21, 57, 59, 65, 66] in which temporal and spatial relationship are explored. Our work focuses on RGB-D image object-aw