Accurate Object Detection with Location Relaxation and Regionlets Re-localization

Standard sliding window based object detection requires dense classifier evaluation on densely sampled locations in scale space in order to achieve an accurate localization. To avoid such dense evaluation, selective search based algorithms only evaluate t

PDF / 1,353,353 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
84 Downloads / 206 Views

DOWNLOAD

REPORT

Stevens Institute of Technology, Hoboken, NJ 07030, USA NEC Laboratories America, Cupertino, CA 95014, USA [email protected] 3 Facebook, Menlo Park, CA 94026, USA

Abstract. Standard sliding window based object detection requires dense classiﬁer evaluation on densely sampled locations in scale space in order to achieve an accurate localization. To avoid such dense evaluation, selective search based algorithms only evaluate the classiﬁer on a small subset of object proposals. Notwithstanding the demonstrated success, object proposals do not guarantee perfect overlap with the object, leading to a suboptimal detection accuracy. To address this issue, we propose to ﬁrst relax the dense sampling of the scale space with coarse object proposals generated from bottom-up segmentations. Based on detection results on these proposals, we then conduct a top-down search to more precisely localize the object using supervised descent. This twostage detection strategy, dubbed location relaxation, is able to localize the object in the continuous parameter space. Furthermore, there is a conﬂict between accurate object detection and robust object detection. That is because the achievement of the later requires the accommodation of inaccurate and perturbed object locations in the training phase. To address this conﬂict, we leverage the rich spatial information learned from the Regionlets detection framework to determine where the object is precisely localized. Our proposed approaches are extensively validated on the PASCAL VOC 2007 dataset and a self-collected large scale car dataset. Our method boosts the mean average precision of the current state-of-the-art (41.7 %) to 44.1 % on PASCAL VOC 2007 dataset. To our best knowledge, it is the best performance reported without using outside data (Convolutional neural network based approaches are commonly pre-trained on a large scale outside dataset and ﬁne-tuned on the VOC dataset.).

1

Introduction

An object may appear in any locations and scales in an image deﬁned by the continuous parameter space spanned by (x, y, s, a), where (x, y) is the object center point, and s and a are the scale and aspect ratio of the object. In particular, diﬀerent aspect ratios generally correspond to diﬀerent viewpoints, leaving a diﬃcult open question for robust object detection. c Springer International Publishing Switzerland 2015 D. Cremers et al. (Eds.): ACCV 2014, Part I, LNCS 9003, pp. 260–275, 2015. DOI: 10.1007/978-3-319-16865-4 17

Accurate Object Detection with Location Relaxation

261

Fig. 1. Sample detection results applying our detection framework to the PASCAL VOC 2007 dataset. First row: bus and boat detection. Second row: bottle, aeroplane and bird detection. Third row: bicycle detection.

In order to accurately localize the object in the image, sliding window based detector [1–5] requires densely sampling a ﬁxed size candidate object window (i.e., a base window) from the continuous parameter space at each scale of a scalespace image pyramid. Then, a binary decision is made for each s

Data Loading...

Accurate Object Detection with Location Relaxation and Regionlets Re-localization

Recommend Documents

FA3D: Fast and Accurate 3D Object Detection

Balanced Loss for Accurate Object Detection

PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments

Quantum-Soft QUBO Suppression for Accurate Object Detection

Accurate RGB-D Salient Object Detection via Collaborative Learning

Interactive Feature Growing for Accurate Object Detection in Megapixel Images

Object Detection and Recognition

Object Detection with Convolutional Neural Networks

Single shot object detection with refined feature

Salient Object Detection with Edge Recalibration

End-to-End Object Detection with Transformers

Object affordance detection with relationship-aware network