Is Faster R-CNN Doing Well for Pedestrian Detection?

Detecting pedestrian has been arguably addressed as a special topic beyond general object detection. Although recent deep learning object detectors such as Fast/Faster R-CNN have shown excellent performance for general object detection, they have limited

  • PDF / 1,238,400 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 35 Downloads / 243 Views

DOWNLOAD

REPORT


School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China [email protected], [email protected], [email protected] 2 Microsoft Research, Beijing, China [email protected]

Abstract. Detecting pedestrian has been arguably addressed as a special topic beyond general object detection. Although recent deep learning object detectors such as Fast/Faster R-CNN have shown excellent performance for general object detection, they have limited success for detecting pedestrian, and previous leading pedestrian detectors were in general hybrid methods combining hand-crafted and deep convolutional features. In this paper, we investigate issues involving Faster R-CNN for pedestrian detection. We discover that the Region Proposal Network (RPN) in Faster R-CNN indeed performs well as a stand-alone pedestrian detector, but surprisingly, the downstream classifier degrades the results. We argue that two reasons account for the unsatisfactory accuracy: (i) insufficient resolution of feature maps for handling small instances, and (ii) lack of any bootstrapping strategy for mining hard negative examples. Driven by these observations, we propose a very simple but effective baseline for pedestrian detection, using an RPN followed by boosted forests on shared, high-resolution convolutional feature maps. We comprehensively evaluate this method on several benchmarks (Caltech, INRIA, ETH, and KITTI), presenting competitive accuracy and good speed. Code will be made publicly available. Keywords: Pedestrian detection · Convolutional neural networks Boosted forests · Hard-negative mining

1

·

Introduction

Pedestrian detection, as a key component of real-world applications such as automatic driving and intelligent surveillance, has attracted special attention beyond general object detection. Despite the prevalent success of deeply learned features in computer vision, current leading pedestrian detectors (e.g., [1–4]) are in general hybrid methods that combines traditional, hand-crafted features [5,6] and deep convolutional features [7,8]. For example, in [1] a stand-alone pedestrian detector [9] (that uses Squares Channel Features) is adopted as a highly selective proposer (