Accurate Object Detection with Location Relaxation and Regionlets Re-localization

Standard sliding window based object detection requires dense classifier evaluation on densely sampled locations in scale space in order to achieve an accurate localization. To avoid such dense evaluation, selective search based algorithms only evaluate t

  • PDF / 1,353,353 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 84 Downloads / 173 Views

DOWNLOAD

REPORT


Stevens Institute of Technology, Hoboken, NJ 07030, USA NEC Laboratories America, Cupertino, CA 95014, USA [email protected] 3 Facebook, Menlo Park, CA 94026, USA

Abstract. Standard sliding window based object detection requires dense classifier evaluation on densely sampled locations in scale space in order to achieve an accurate localization. To avoid such dense evaluation, selective search based algorithms only evaluate the classifier on a small subset of object proposals. Notwithstanding the demonstrated success, object proposals do not guarantee perfect overlap with the object, leading to a suboptimal detection accuracy. To address this issue, we propose to first relax the dense sampling of the scale space with coarse object proposals generated from bottom-up segmentations. Based on detection results on these proposals, we then conduct a top-down search to more precisely localize the object using supervised descent. This twostage detection strategy, dubbed location relaxation, is able to localize the object in the continuous parameter space. Furthermore, there is a conflict between accurate object detection and robust object detection. That is because the achievement of the later requires the accommodation of inaccurate and perturbed object locations in the training phase. To address this conflict, we leverage the rich spatial information learned from the Regionlets detection framework to determine where the object is precisely localized. Our proposed approaches are extensively validated on the PASCAL VOC 2007 dataset and a self-collected large scale car dataset. Our method boosts the mean average precision of the current state-of-the-art (41.7 %) to 44.1 % on PASCAL VOC 2007 dataset. To our best knowledge, it is the best performance reported without using outside data (Convolutional neural network based approaches are commonly pre-trained on a large scale outside dataset and fine-tuned on the VOC dataset.).

1

Introduction

An object may appear in any locations and scales in an image defined by the continuous parameter space spanned by (x, y, s, a), where (x, y) is the object center point, and s and a are the scale and aspect ratio of the object. In particular, different aspect ratios generally correspond to different viewpoints, leaving a difficult open question for robust object detection. c Springer International Publishing Switzerland 2015  D. Cremers et al. (Eds.): ACCV 2014, Part I, LNCS 9003, pp. 260–275, 2015. DOI: 10.1007/978-3-319-16865-4 17

Accurate Object Detection with Location Relaxation

261

Fig. 1. Sample detection results applying our detection framework to the PASCAL VOC 2007 dataset. First row: bus and boat detection. Second row: bottle, aeroplane and bird detection. Third row: bicycle detection.

In order to accurately localize the object in the image, sliding window based detector [1–5] requires densely sampling a fixed size candidate object window (i.e., a base window) from the continuous parameter space at each scale of a scalespace image pyramid. Then, a binary decision is made for each s