Monocular depth estimation based on deep learning: An overview
- PDF / 2,850,131 Bytes
- 16 Pages / 612 x 792 pts (letter) Page_size
- 36 Downloads / 238 Views
Print-CrossMark
https://doi.org/10.1007/s11431-020-1582-8
Special Topic: Industrial Artificial Intelligence
. Review .
Monocular depth estimation based on deep learning: An overview ZHAO ChaoQiang, SUN QiYu, ZHANG ChongZhen, TANG Yang* & QIAN Feng Key Laboratory of Advanced Control and Optimization for Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China Received February 27, 2020; accepted March 25, 2020; published online June 10, 2020
Depth information is important for autonomous systems to perceive environments and estimate their own state. Traditional depth estimation methods, like structure from motion and stereo vision matching, are built on feature correspondences of multiple viewpoints. Meanwhile, the predicted depth maps are sparse. Inferring depth information from a single image (monocular depth estimation) is an ill-posed problem. With the rapid development of deep neural networks, monocular depth estimation based on deep learning has been widely studied recently and achieved promising performance in accuracy. Meanwhile, dense depth maps are estimated from single images by deep neural networks in an end-to-end manner. In order to improve the accuracy of depth estimation, different kinds of network frameworks, loss functions and training strategies are proposed subsequently. Therefore, we survey the current monocular depth estimation methods based on deep learning in this review. Initially, we conclude several widely used datasets and evaluation indicators in deep learning-based depth estimation. Furthermore, we review some representative existing methods according to different training manners: supervised, unsupervised and semi-supervised. Finally, we discuss the challenges and provide some ideas for future researches in monocular depth estimation. autonomous systems, monocular depth estimation, deep learning, unsupervised learning Citation:
Zhao C Q, Sun Q Y, Zhang C Z, et al. Monocular depth estimation based on deep learning: An overview. Sci China Tech Sci, 2020, 63, https://doi.org/10.1007/s11431-020-1582-8
1 Introduction Estimating depth information from images is one of the basic and important tasks in computer vision, which can be widely used in simultaneous localization and mapping (SLAM) [1], navigation [2], object detection [3] and semantic segmentation [4], etc. Geometry-based methods Recovering 3D structures from a couple of images based on geometric constraints is a popular way to perceive depth, and it has been widely investigated in recent forty years. Structure from motion (SfM) [5] is a representative method for estimating 3D structures from a series of 2D image sequences and is applied in 3D reconstruction [6] and SLAM [7] successfully. The depth of sparse
features can be handled by SfM through feature correspondences and geometric constraints between image sequences, i.e., the accuracy of depth estimation relies heavily on the exact feature matching and high-quality image sequences. Furthermore, SfM suffers from monocular
Data Loading...