Real-time monocular depth estimation with adaptive receptive fields

PDF / 2,472,352 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
71 Downloads / 242 Views

SPECIAL ISSUE PAPER

Real‑time monocular depth estimation with adaptive receptive fields Zhenyan Ji1 · Xiaojun Song1 · Xiaoxuan Guo1 · Fangshi Wang1 · José Enrique Armendáriz‑Iñigo2 Received: 27 January 2020 / Accepted: 9 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy. Keywords Monocular depth estimation · Adaptive receptive field · Real-time performance · Convolutional neural network

1 Introduction Autonomous driving is one of the most popular areas in artificial intelligence (AI) research today and depth estimation is a key technique to realize it. Depth estimation is applied to images obtained from binocular cameras or monocular cameras. Due to scale uncertainty of monocular images, accuracy of binocular depth estimation is usually higher than that of monocular depth estimation. However, there are subtle differences in depth maps generated from left-view images and right-view images taken by binocular cameras. Accurate registration of left and right images is the challenge for binocular depth estimation. In addition, the quality of binocular images is also limited by cost of lens and installation sites of cameras. Depth estimation is demanding in the quality of camera imaging such as degree of distortion and focusing. The imaging error of single * Zhenyan Ji [email protected] 1

School of Software Engineering, Beijing Jiaotong University, 100044 Beijing, China

Department of Statistics, Computer Science and Mathematics, Public University of Navarre, 31006 Pamplona, Spain

2

universal camera lens is currently around 5%, which means that binocular cameras has much higher error than 5%. To guarantee low-error imaging quality, advanced lenses with high cost are required. In terms of installation site, the distance of two cameras must be accurate, otherwise it will affect the accuracy of depth estimation. Besides, change in temperatur

Data Loading...

Real-time monocular depth estimation with adaptive receptive fields

Recommend Documents

Guiding Monocular Depth Estimation Using Depth-Attention Volume

CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss

Disambiguating Monocular Depth Estimation with a Single Transient

Coarse-to-fine Planar Regularization for Dense Monocular Depth Estimation

FF-GAN: Feature Fusion GAN for Monocular Depth Estimation

Multi-loss Rebalancing Algorithm for Monocular Depth Estimation

Linear Depth Estimation from an Uncalibrated, Monocular Polarisation Image

Monocular depth estimation based on deep learning: An overview

Detection and Depth Estimation for Objects from Single Monocular Image

Occlusion-Aware Depth Estimation with Adaptive Normal Constraints

A Light-Weight Monocular Depth Estimation with Edge-Guided Occlusion Fading Reduction

\(S^3\) Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data