Instance-Sensitive Fully Convolutional Networks

Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contras

  • PDF / 6,213,996 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 1 Downloads / 213 Views

DOWNLOAD

REPORT


3

Microsoft Research, Beijing, China [email protected] 2 Tsinghua University, Beijing, China University of Science and Technology of China, Hefei, China

Abstract. Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set of instance-sensitive score maps, each of which is the outcome of a pixel-wise classifier of a relative position to instances. On top of these instance-sensitive score maps, a simple assembling module is able to output instance candidate at each position. In contrast to the recent DeepMask method for segmenting instances, our method does not have any high-dimensional layer related to the mask resolution, but instead exploits image local coherence for estimating instances. We present competitive results of instance segment proposal on both PASCAL VOC and MS COCO.

1

Introduction

Fully convolutional networks (FCN) [1] have been proven an effective end-to-end solution to semantic image segmentation. An FCN produces a score map of a size proportional to the input image, where every pixel represents a classifier of objects. Despite good accuracy and ease of usage, FCNs are not directly applicable for producing instance segments (Fig. 1 (top)). Previous instance semantic segmentation methods (e.g., [2–5]) in general resorted to off-the-shelf segment proposal methods (e.g., [6,7]). In this paper, we develop an end-to-end fully convolutional network that is capable of segmenting candidate instances. Like the FCN in [1], in our method every pixel still represents a classifier ; but unlike an FCN that generates one score map (for one object category), our method computes a set of instancesensitive score maps, where each pixel is a classifier of relative positions to an object instance (Fig. 1 (bottom)). For example, with a 3×3 regular grid depicting relative positions, we produce a set of 9 score maps in which, e.g., the map #6 This work was done when Yi Li and Shaoqing Ren were interns at Microsoft Research. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 534–549, 2016. DOI: 10.1007/978-3-319-46466-4 32

Instance-Sensitive Fully Convolutional Networks

535

Fig. 1. Methodological comparisons between: (top) FCN [1] for semantic segmentation; (bottom) our InstanceFCN for instance segment proposal.

in Fig. 1 has high scores on the “right side” of object instances. With this set of score maps, we are able to generate an object instance segment in each sliding window by assembling the output from the score maps. This procedure enables a fully convolutional way of producing segment instances. Most related to our method, DeepMask [8] is an instance segment proposal method driven by convolutional networks. DeepMask learns a function that maps an image sliding window to