SSD: Single Shot MultiBox Detector

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. A

PDF / 10,276,204 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
72 Downloads / 282 Views

DOWNLOAD

REPORT

4

UNC Chapel Hill, Chapel Hill, USA {wliu,cyfu,aberg}@cs.unc.edu 2 Zoox Inc., Palo Alto, USA [email protected] 3 Google Inc., Mountain View, USA {dumitru,szegedy}@google.com University of Michigan, Ann-Arbor, USA [email protected]

Abstract. We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with diﬀerent resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets conﬁrm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a uniﬁed framework for both training and inference. For 300 × 300 input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512 × 512 input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https://github.com/weiliu89/caﬀe/ tree/ssd.

Keywords: Real-time object detection

1

· Convolutional neural network

Introduction

Current state-of-the-art object detection systems are variants of the following approach: hypothesize bounding boxes, resample pixels or features for each box, and c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 21–37, 2016. DOI: 10.1007/978-3-319-46448-0 2

22

W. Liu et al.

apply a high-quality classiﬁer. This pipeline has prevailed on detection benchmarks since the Selective Search work [1] through the current leading results on PASCAL VOC, COCO, and ILSVRC detection all based on Faster R-CNN [2] albeit with deeper features such as [3]. While accurate, these approaches have been too computationally intensive for embedded systems and, even with highend hardware, too slow for real-time applications. Often detection speed for these approaches is measured in frames per second, and even the fastest high-accuracy detector, Faster R-CNN, operates at only 7 frames per second (FPS). There have been many attempts to build faster detectors by attacking each stage of the detection pipeline (see related work in Sect. 4), but so far, signiﬁcantly increased speed comes only at the cost of signiﬁcantly decreased detection accuracy. Thi

Data Loading...

SSD: Single Shot MultiBox Detector

Recommend Documents

SFSSD: Shallow Feature Fusion Single Shot Multibox Detector

Fire Detection from Images Based on Single Shot MultiBox Detector

LS-Net: fast single-shot line-segment detector

Efficient Single Shot Object Detector Towards More Accurate and Faster Prediction

Single shot object detection with refined feature

Single-Shot Neural Relighting and SVBRDF Estimation

Improved SSD for Object Detection

ONE SHOT - single shot radiotherapy for localized prostate cancer: study protocol of a single arm, multicenter phase I/I

Single Path One-Shot Neural Architecture Search with Uniform Sampling

Single-Shot Deep Volumetric Regression for Mobile Medical Augmented Reality

Single-Shot Retinal Image Enhancement Using Deep Image Priors

Single Shot Line Profile Measurement of Multi-layered Film Thicknesses