A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multi

PDF / 847,054 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
46 Downloads / 300 Views

DOWNLOAD

REPORT

2

SVCL, UC San Diego, San Diego, USA {zwcai,nuno}@ucsd.edu IBM T. J. Watson Research, Yorktown Heights, USA {qfan,rsferi}@us.ibm.com

Abstract. A uniﬁed deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection subnetwork. In the proposal sub-network, detection is performed at multiple output layers, so that receptive ﬁelds match objects of diﬀerent scales. These complementary scale-speciﬁc detectors are combined to produce a strong multi-scale object detector. The uniﬁed network is learned endto-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.

Keywords: Object detection

1

· Multi-scale · Uniﬁed neural network

Introduction

Classical object detectors, based on the sliding window paradigm, search for objects at multiple scales and aspect ratios. While real-time detectors are available for certain classes of objects, e.g. faces or pedestrians [1,2], it has proven difﬁcult to build detectors of multiple object classes under this paradigm. Recently, there has been interest in detectors derived from deep convolutional neural networks (CNNs) [3–7]. While these have shown much greater ability to address the multiclass problem, less progress has been made towards the detection of objects at multiple scales. The R-CNN [3] samples object proposals at multiple scales, using a preliminary attention stage [8], and then warps these proposals to the size (e.g. 224 × 224) supported by the CNN. This is, however, very inefﬁcient from a computational standpoint. The development of an eﬀective and computationally eﬃcient region proposal mechanism is still an open problem. The more recent Faster-RCNN [9] addresses the issue with a region proposal network (RPN), which enables end-to-end training. However, the RPN generates proposals of multiple scales by sliding a ﬁxed set of ﬁlters over a ﬁxed set of convolutional feature maps. This creates an inconsistency between the sizes of c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 354–370, 2016. DOI: 10.1007/978-3-319-46493-0 22

A Uniﬁed Multi-scale Deep Convolutional Neural Network

355

Fig. 1. In natural images, objects can appear at very diﬀerent scales, as illustrated by the yellow bounding boxes. A single receptive ﬁeld, such as that of the RPN [9] (shown in the shaded area), cannot match this variability.

objects, which are variable, and ﬁlter receptive ﬁelds, which are ﬁxed. As shown in Fig. 1, a ﬁxed receptive ﬁeld cannot cover the multiple scales at which objects appear in natural scenes. This compromises detection performance, which tends to be particularly poor for small objects, like that in the center of Fig. 1. In

Data Loading...

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

Recommend Documents

Deep Convolutional Neural Network for Microseismic Signal Detection and Classification

Object Detection with Convolutional Neural Networks

Apple Defect Detection Based on Deep Convolutional Neural Network

DCNN-IDS: Deep Convolutional Neural Network Based Intrusion Detection System

A Convolutional Neural Network Framework for Accurate Skin Cancer Detection

A Network Intrusion Detection Method Based on Deep Multi-scale Convolutional Neural Network

Image Orientation Detection Using Convolutional Neural Network

A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network

Fruit Classification Through Deep Learning: A Convolutional Neural Network Approach

Deep Convolutional Neural Network for Remote Sensing Scene Classification

Deep Convolutional Neural Network for Real and Fake Face Discrimination

Fast and Robust Compression of Deep Convolutional Neural Networks