Stereo Frustums: a Siamese Pipeline for 3D Object Detection

PDF / 7,818,596 Bytes
15 Pages / 595.224 x 790.955 pts Page_size
12 Downloads / 279 Views

Stereo Frustums: a Siamese Pipeline for 3D Object Detection Xi Mo1

· Usman Sajid1 · Guanghui Wang2

Received: 19 July 2020 / Accepted: 27 October 2020 © Springer Nature B.V. 2020

Abstract The paper proposes a light-weighted stereo frustums matching module for 3D objection detection. The proposed framework takes advantage of a high-performance 2D detector and a point cloud segmentation network to regress 3D bounding boxes for autonomous driving vehicles. Instead of performing traditional stereo matching to compute disparities, the module directly takes the 2D proposals from both the left and the right views as input. Based on the epipolar constraints recovered from the well-calibrated stereo cameras, we propose four matching algorithms to search for the best match for each proposal between the stereo image pairs. Each matching pair proposes a segmentation of the scene which is then fed into a 3D bounding box regression network. Results of extensive experiments on KITTI dataset demonstrate that the proposed Siamese pipeline outperforms the state-of-the-art stereo-based 3D bounding box regression methods. Keywords Stereopsis · LiDAR · Stereo matching · Epipolar constraint · Segmentation · Amodal regression

1 Introduction How to regress accurate 3D bounding boxes (bbox) for autonomous driving vehicles has become a pivotal topic recently. This technique can also benefit mobile robots and unmanned aerial vehicles with regard to scene understanding and reasoning. In this paper, we propose a Siamese pipeline method for 3D object detection. Given a pair of stereo images and the point cloud data collected by velodyne [5], many approaches on a basis of deep-learning theories have been proposed to generate 3D bbox artifacts which can also be projected to a bird’seye view (BEV) of LiDAR data for localization evaluation. According to the number of image views these approaches Guanghui Wang

[email protected] Xi Mo [email protected] Usman Sajid [email protected] 1

Department of Electrical Engineering and Computer Science, School of Engineering, University of Kansas, Lawrence, KS 66045, USA

2

Department of Computer Science, Ryerson University, Toronto, ON M5B 2K3, Canada

utilized, they can be divided into three categories: monocular view [3, 8, 11, 16, 21, 23, 27, 28], binocular views [2, 7, 10, 19, 26], and non-view approaches [13, 22, 25, 30–32, 35] that only processes point cloud. Mono-view based approaches focus on sensor-fusion of the camera and LiDAR sensors in either a global or a local manner, while non-view approaches extract point cloud features from hand-crafted voxels or raw coordinates. Compared to the extensive development in both categories mentioned above, there are fewer stereo-based and stereopsis-LiDAR-fusion works for 3D object detection. Considering the runtime of stereo matching, coarse disparity map generated by fast stereo matching and GPU acceleration achieves real-time frame-rate, yet less accurate 3D detection results [7] compared with that of coarse-to-fine disparity map [26]. However, it usually tak

Data Loading...

Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Recommend Documents

Fully-Convolutional Siamese Networks for Object Tracking

FA3D: Fast and Accurate 3D Object Detection

Stereo Scene Flow for 3D Motion Analysis

Distance-Normalized Unified Representation for Monocular 3D Object Detection

Rotation-Robust Intersection over Union for 3D Object Detection

Monocular Differentiable Rendering for Self-supervised 3D Object Detection

Towards Generalization Across Depth for Monocular 3D Object Detection

Reinforced Axial Refinement Network for Monocular 3D Object Detection

3D sketching for 3D object retrieval

SiamMN: Siamese modulation network for visual object tracking

Hierarchical correlation siamese network for real-time object tracking

Finding Your (3D) Center: 3D Object Detection Using a Learned Loss