Coarse-to-fine Planar Regularization for Dense Monocular Depth Estimation

Simultaneous localization and mapping (SLAM) using the whole image data is an appealing framework to address shortcoming of sparse feature-based methods – in particular frequent failures in textureless environments. Hence, direct methods bypassing the nee

  • PDF / 9,834,157 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 43 Downloads / 222 Views

DOWNLOAD

REPORT


Toshiba Research Europe, Cambridge, UK [email protected] 2 University of Oxford, Oxford, UK

Abstract. Simultaneous localization and mapping (SLAM) using the whole image data is an appealing framework to address shortcoming of sparse feature-based methods – in particular frequent failures in textureless environments. Hence, direct methods bypassing the need of feature extraction and matching became recently popular. Many of these methods operate by alternating between pose estimation and computing (semi-)dense depth maps, and are therefore not fully exploiting the advantages of joint optimization with respect to depth and pose. In this work, we propose a framework for monocular SLAM, and its local model in particular, which optimizes simultaneously over depth and pose. In addition to a planarity enforcing smoothness regularizer for the depth we also constrain the complexity of depth map updates, which provides a natural way to avoid poor local minima and reduces unknowns in the optimization. Starting from a holistic objective we develop a method suitable for online and real-time monocular SLAM. We evaluate our method quantitatively in pose and depth on the TUM dataset, and qualitatively on our own video sequences.

Keywords: SLAM mapping

1

·

Monocular odometry

·

Dense tracking and

Introduction

Simultaneous localization and mapping (SLAM), also known as online structure from motion, aims to produce trajectory estimations and a 3D reconstruction of the environment in real-time. In modern technology, its application ranges from autonomous driving, navigation and robotics to interactive learning, gaming and enhanced reality [1–7]. Typically, SLAM comprises two key components: (1) a local model, which generates fast initial odometry measurements (which often includes a local 3D reconstruction – e.g. a depth map – as byproduct), and Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46475-6 29) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 458–474, 2016. DOI: 10.1007/978-3-319-46475-6 29

Coarse-to-fine Planar Regularization for Dense Monocular Depth Estimation Keyframe

Frame 50

Frame 100

Frame 150

Frame 200

459

Frame 250

Fig. 1. During keyframe-to-frame comparison a dense depth map is build. Image, point cloud and depth (top to bottom) are shown as they develop, for selected frames from a single keyframe. (While depth is dense at the keyframe, their projections may not be.)

(2) a global model, which performs loop closures and pose refinement via large scale sub-real-time bundle adjustment. In our work, we focus on the former, and propose a new strategy for local monocular odometry and depth map estimation. Estimating the 3D position of tracked landmarks is a key ingredient in any SLAM system, since it directly allows for the poses to be computed w.r.t. a common coordinate frame. Historically, visual landmarks are induce