Improving Constrained Bundle Adjustment Through Semantic Scene Labeling

There is no doubt that SLAM and deep learning methods can benefit from each other. Most recent approaches to coupling those two subjects, however, either use SLAM to improve the learning process, or tend to ignore the geometric solutions that are currentl

  • PDF / 2,467,540 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 18 Downloads / 215 Views

DOWNLOAD

REPORT


CEA LIST, Vision and Content Engineering Lab, Point Courrier 94, 91191 Gif-sur-Yvette, France {achkan.salehi,vincent.gay-bellile,steve.bourgeois}@cea.fr 2 Institut Pascal, UMR 6602 CNRS, Clermont-Ferrand, France [email protected]

Abstract. There is no doubt that SLAM and deep learning methods can benefit from each other. Most recent approaches to coupling those two subjects, however, either use SLAM to improve the learning process, or tend to ignore the geometric solutions that are currently used by SLAM systems. In this work, we focus on improving city-scale SLAM through the use of deep learning. More precisely, we propose to use CNNbased scene labeling to geometrically constrain bundle adjustment. Our experiments indicate a considerable increase in robustness and precision. Keywords: SLAM Scene labeling

1

·

VSLAM

·

Bundle adjustment

·

Deep learning

·

Introduction

The problem of the drift of monocular visual simultaneous localization and mapping (VSLAM) in seven degrees of freedom is well-known. Fusion of VSLAM, in particular key-frame bundle adjustment (BA) [1,2] with data from various sensors (e.g. GPS, IMU [3–5]) and databases (e.g. 3d textured or textureless 3d models, digital elevation models [6–8]) has proven to be a reliable solution to this problem. In this paper, we focus on fusion through constrained BA [2–4,6]. Among the available sensors and databases that can be used in constrained BA, textureless 3d building models are of particular interest, since the geometric constraints they impose on the reconstruction can prevent scale drift and also help in the estimation of camera yaw. Furthermore, they can be used to limit the impact of GPS bias on the reconstruction [9]. They are also, as opposed to textured models, widespread and easily (usually freely) available. However, methods that make use of such partial knowledge of the environment [6,8] face the problem of data association between 3d points and 3d building planes, that is, they must design a reliable method to segment the 3d point cloud and determine which points belong to buildings. In previous works [6,7], data association between 3d points and building models has been made by means of simple geometric constraints instead of photometric ones. This is due to the high cost of c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part III, LNCS 9915, pp. 133–142, 2016. DOI: 10.1007/978-3-319-49409-8 13

134

A. Salehi et al.

scene labeling algorithms. Unfortunately, these simple geometric criteria often introduce high amounts of noise, which can lead to failure even when used in conjunction with M-estimators or RANSAC-like algorithms. This is especially true when building facades are completely occluded by nearby objects on which an important number of interest points are detected (e.g. trees, advertising boards, etc.). Since these methods clearly reach their limits in such environments, we must investigate the alternative solution, namely scene labeling. While current s