Single shot object detection with refined feature

  • PDF / 3,170,536 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 61 Downloads / 238 Views

DOWNLOAD

REPORT


Single shot object detection with refined feature Xiaojuan Zhang 1

1

1

1

& Changying Wang & Li Cheng & Shuihan Jiang & Junting Qi

1

Received: 21 October 2019 / Revised: 22 July 2020 / Accepted: 28 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Object classification and localization are two significant aspects of object detector based on the Single Shot MultiBox Detector (SSD). In general, the more feature maps there are, the better the object classification performance will be. However, when the information of excessive feature maps are sparse and unnecessary, the performance of object detection is slightly improved or maybe precisely opposite, which is instead harmful to the production of object localization. The performance of object detectors is not only related to the number of feature maps but also relies partly on the bounding box regression and Non-Maximum Suppression (NMS). In this paper, a detector is constructed based on SSD, called Detection with Refined Feature (DRF), involving center map and scale map, the detection loss is reshaped. Our motivation is to improve the accuracy of classification and localization by searching for central points and predicting the scales of the object points. Center map is used to predict the Intersection over Union (IoU) between the prediction box and ground truth box, while scale map considers the relationships among the different scales. Experimental results on both Pascal VOC and MS COCO 2014 instance datasets demonstrate the effectiveness of DRF. Using Darknet53, we achieve an 86.4% mean Average Precision (mAP) on Pascal VOC2007 and an 87.4% mAP on Pascal VOC2007 and VOC2012. On MS COCO, the DRF with ResNet50 still achieves moderate improvement. Keywords Object detection . SSD . Bounding box . Center map . Scale map

Project supported by the Research on Pixel Coordinate Calibration Method for Video by Multi-Mobile Terminal Collaboration (No.CXZX2016029)

* Li Cheng [email protected]

1

College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China

Multimedia Tools and Applications

1 Introduction Object detection not only classifies object categories but also predicts the location of each object. The bounding boxes are utilized for specified object categories and the class label of the precise localization. The redundant bounding boxes are removed by an NMS procedure [19]. Currently, object detection frameworks fall into two categories, that are two-stage detectors and one-stage detectors. Many object detectors have been proposed to improve accuracy and speed. Two-stage detectors commonly achieve better classification performance, while onestage detectors are significantly more time-efficient and have greater applicability to real-time object detection [39]. Two-stage detectors first generate a sparse set of proposals with a proposal generator. Then region classifiers are used to predict the category of the proposed region. One-stage detectors directly make a definite