Local keypoint-based Faster R-CNN

  • PDF / 1,192,794 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 34 Downloads / 205 Views

DOWNLOAD

REPORT


Local keypoint-based Faster R-CNN Xintao Ding 1,2

&

Qingde Li 3 & Yongqiang Cheng 3 & Jinbao Wang 1,2 & Weixin Bian 1,2 & Biao Jie 1,2

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Region-based Convolutional Neural Network (R-CNN) detectors have achieved state-of-the-art results on various challenging benchmarks. Although R-CNN has achieved high detection performance, the research of local information in producing candidates is insufficient. In this paper, we design a Keypoint-based Faster R-CNN (K-Faster) method for object detection. K-Faster incorporates local keypoints in Faster R-CNN to improve the detection performance. In detail, a sparse descriptor, which first detects the points of interest in a given image and then samples a local patch and describes its invariant features, is first employed to produce keypoints. All 2-combinations of the produced keypoints are second selected to generate keypoint anchors, which are helpful for object detection. The heterogeneously distributed anchors are then encoded in feature maps based on their areas and center coordinates. Finally, the keypoint anchors are coupled with the anchors produced by Faster R-CNN, and the coupled anchors are used for Region Proposal Network (RPN) training. Comparison experiments are implemented on PASCAL VOC 07/ 12 and MS COCO. The experimental results show that our K-Faster approach not only increases the mean Average Precision (mAP) performance but also improves the positioning precision of the detected boxes. Keywords Keypoint . SIFT . Convolutional neural network . Faster R-CNN

1 Introduction General object detection is a complex problem. One of the main tasks for object detection is the localization problem, which is

* Xintao Ding [email protected] * Qingde Li [email protected] Yongqiang Cheng [email protected] Jinbao Wang [email protected] Weixin Bian [email protected] Biao Jie [email protected] 1

School of Computer and Information, Anhui Normal University, Wuhu 241002, China

2

Anhui Province Key Laboratory of Network and Information Security, Wuhu 241002, China

3

School of Engineering and Computer Science, University of Hull, Hull HU67RX, UK

used to assign accurate bounding boxes to different objects [1]. In the last two decades, object detectors based on Convolutional Neural Networks (CNNs) [2–5] have achieved state-of-the-art results on various challenging benchmarks [6–8]. As two representative Region-based CNN (R-CNN) methods, both Fast/ Faster R-CNN [3, 4] and Region-based Fully Convolutional Network (R-FCN) [9] use a Region Proposal Network (RPN) to generate region proposals. RPN initializes anchors of different scales and aspect ratios at each convolutional feature map location [4]. Although the anchor potentially covers the object of interest, it does not focus on local information. When a human identifies an object, both global structural information and local individual information are used in the identification [10]. Our work is motivated by the following two questions. First, is