An FPGA-based real-time occlusion robust stereo vision system using semi-global matching

  • PDF / 4,370,464 Bytes
  • 22 Pages / 595.276 x 790.866 pts Page_size
  • 11 Downloads / 211 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH PAPER

An FPGA‑based real‑time occlusion robust stereo vision system using semi‑global matching Lucas F. S. Cambuim1 · Luiz A. Oliveira Jr.1 · Edna N. S. Barros1 · Antonyus P. A. Ferreira1 Received: 7 September 2018 / Accepted: 19 July 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract Stereo matching approaches are an appealing choice for acquiring depth information in a number of video processing applications. It is desirable that these solutions generate dense, robust disparity maps in real time. However, occlusion regions may disturb the applications that need these maps. Among the best of these approaches is the semi-global matching (SGM) technique. This paper presents an FPGA-based stereo vision system based on SGM. This system calculates disparity maps by streaming, which are scalable to several resolutions and disparity ranges. To increase the robustness of the SGM technique even further, the present work has implemented a combination of the gradient filter and the sampling-insensitive absolute difference in the pre-processing phase. Furthermore, as a post-processing step, this paper proposes a novel streaming architecture to detect noisy and occluded regions. The FPGA-based implementations of the proposed stereo matching system in two distinct heterogeneous architecture (GPP—general purpose processor, and FPGA) were evaluated using the Middlebury stereo vision benchmark. The achieved results reported a frame rate of 25 FPS for the disparity maps processing in HD resolution (1024 × 768 pixels), with 256 disparity levels. The results have demonstrated that the memory utilization, processing performance, and accuracy are among the best of FPGA-based stereo vision systems. Keywords  Stereo matching · Real time · High precision · Occlusion check · FPGA

1 Introduction Stereo vision includes many computer vision approaches for calculating the depth between a camera system and objects. Typically, a stereo vision system has two cameras to collect images from two viewpoints of the same scenario. Using stereo matching techniques, we search for the disparity between pixels in the two images that represent the same real 3D point in the scene. Thus, the depth may then be calculated from the inverse of this disparity. Applications in robotics, such as 3D reconstruction [1], object detection/recognition [2] and autonomous navigation [3], demand that stereo vision systems are able to calculate dense disparity maps in real time. Moreover, due to the advances in camera technology, a need has been perceived for systems that are capable of processing high-resolution images. For instance, the full-size Middlebury stereo pairs * Lucas F. S. Cambuim [email protected] 1



Informatics Center, Federal University of Pernambuco, Recife, Pernambuco, Brazil

[4] have an average of 1.4 megapixels and a disparity range of 200 pixels. The greater the range of disparity and resolution, the greater the ability of the system to identify more objects at different depths. The quality of the depth mea