C-FCN: Corners-based fully convolutional network for visual object detection

  • PDF / 2,463,679 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 52 Downloads / 155 Views

DOWNLOAD

REPORT


C-FCN: Corners-based fully convolutional network for visual object detection Lin Jiao 1,2 & Rujing Wang 1 & Chengjun Xie 1 Received: 16 March 2020 / Revised: 27 June 2020 / Accepted: 29 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Object detection has achieved significantly progresses in recent years. Proposal-based methods have become the mainstream object detectors, achieving excellent performance on accurate recognition and localization of objects. However, region proposal generation is still a bottleneck. In this paper, to address the limitations of conventional region proposal network (RPN) that defines dense anchor boxes with different scales and aspect ratios, we propose an anchorfree proposal generator named corner region proposal network (CRPN) which is based on a pair of key-points, including top-left corner and bottom-right corner of an object bounding box. First, we respectively predict the top-left corners and bottom-right corners by two sibling convolutional layers, then we obtain a set of object proposals by grouping strategy and nonmaximum suppression algorithm. Finally, we further merge CRPN and fully convolutional network (FCN) into a unified network, achieving an end-to-end object detection. Our method has been evaluated on standard PASCAL VOC and MS COCO datasets using a deep residual network. Experiment results present that the proposed method outperforms previous detectors in the term of precision. Additionally, it runs with a speed of 76 ms per image on a single GPU by using ResNet-50 as the backbone, which is faster than other detectors. Keywords Object detection . Anchor-free . Corners . Region proposals . Fully convolutional network

1 Introduction Detecting objects is one of the essential computer vision tasks, aiming to localize and identify objects from images and videos [32]. It is the basis of many other computer vision tasks, such

* Chengjun Xie [email protected]

1

Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei 230031, China

2

University of Science and Technology of China, Hefei 230026, China

Multimedia Tools and Applications

as instance segmentation [47, 20], object tracking [28, 22], and image captioning [43, 9]. Recently, with the development of deep learning, object detection tasks achieve remarkable breakthroughs and attract much attention of research. Now, it has many practical applications, for example, face detection [46, 45], autonomous driving [26], medical diagnosis [36, 39], etc. In early times, most object detectors depend on hand-crafted features. Due to the lack of effective feature representation, researchers have to design complex approaches to improve the capability of image presentation. The most important one is the Histogram of Oriented Gradients (HOG) [8] feature descriptor which is viewed as a vital improvement of the scaleinvariant feature transform [34, 33] and shape contexts [3] at that time. And it has become a cornerstone of numerous object detect