Joint multi-task cascade for instance segmentation

  • PDF / 1,873,599 Bytes
  • 7 Pages / 595.276 x 790.866 pts Page_size
  • 15 Downloads / 254 Views

DOWNLOAD

REPORT


SPECIAL ISSUE PAPER

Joint multi‑task cascade for instance segmentation Yaole Wen1,4 · Fuyuan Hu1,5 · Jinchang Ren2 · Xinru Shang1 · Linyan Li3 · Xuefeng Xi1 Received: 2 November 2019 / Accepted: 29 July 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Instance segmentation requires both pixel-level classification accuracy and high-level semantic features at the target instance level, which is very challenging, and the cascade structure can effectively improve both of these problems. To make full use of the relationship between detection and segmentation, this paper proposes a joint multi-tasking cascade structure, which is not simply to cascade the two tasks of detection and segmentation, but to unitedly put them into multi-stage processing, and especially to integrate the information at different stages of the mask branch. The entire structure can effectively utilize the superior characteristics of each stage in the matter of detection and segmentation, thus improving the quality of mask prediction. The feature fusion process is introduced in the full convolution networks (FCN) branch, and the high-level and low-level features are effectively fused to enhance the contextual information of the picture semantic features. The experiments demonstrate the better results on the COCO dataset. Keywords  Cascade structure · Instance segmentation · Multi-task · Feature fusion

1 Introduction Instance segmentation is a task that applies the pixel-level annotation towards objects that we show interest in an image, further distinguishing between different individuals * Fuyuan Hu [email protected] Yaole Wen [email protected] Jinchang Ren [email protected] Xinru Shang [email protected] 1



School of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, China

2



Centre for Signal and Image Processing, University of Strathclyde, Glasgow, UK

3

Suzhou Institute of Trade & Commerce, Suzhou 215009, Jiangsu, China

4

Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, China

5

Suzhou Key Laboratory for Big Data and Information Service, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, China





of the same type of target. This task is closely related to the target detection and semantic segmentation. Therefore, the existing methods can be roughly classified into two types, which are the detection-based method and the segmentationbased method. The detection-based method utilizes a conventional detector to generate a bounding box or region suggestion, and then predicts the target mask based on the generated bounding box. Many of these methods are based on CNN(Convolutional Neural Networks). Chen et al. [1] proposed the Deeplab network structure, which uses the cavity convolution kernel, in order that the size of the feature map is kept original while increasing the convolution receptive domain