RobotFusion: Grasping with a Robotic Manipulator via Multi-view Reconstruction
We propose a complete system for 3D object reconstruction and grasping based on an articulated robotic manipulator. We deploy an RGB-D sensor as an end effector placed directly on the robotic arm, and process the acquired data to perform multi-view 3D rec
- PDF / 3,874,793 Bytes
- 14 Pages / 439.37 x 666.142 pts Page_size
- 6 Downloads / 196 Views
Abstract. We propose a complete system for 3D object reconstruction and grasping based on an articulated robotic manipulator. We deploy an RGB-D sensor as an end effector placed directly on the robotic arm, and process the acquired data to perform multi-view 3D reconstruction and object grasping. We leverage the high repeatability of the robotic arm to estimate 3D camera poses with millimeter accuracy and control each of the six sensor’s DOF in a dexterous workspace. Thereby, we can estimate camera poses directly by robot kinematics and deploy a Truncated Signed Distance Function (TSDF) to accurately fuse multiple views into a unified 3D reconstruction of the scene. Then, we propose an efficient approach to segment the sought objects out of a planar workbench as well as a novel algorithm to automatically estimate grasping points. Keywords: Grasp
1
· Manipulation · Reconstruction
Introduction
Object recognition and 3D pose estimation are key tasks in industrial applications requiring autonomous robots to understand the surroundings and pursue grasping and manipulation [14]. Indeed, manipulation mandates estimation of the 6 Degree-Of-Freedom (6DOF) pose (position and orientation) of the objects with respect to the base coordinate system of the robot. This pose estimation should be not only robust to clutter, occlusion and sensor noise, but also efficient to avoid slowing down the manipulation process. Most object recognition and pose estimation algorithms rely on matching 2D or 3D features between off-line 3D models (either sets of 3D scans or CAD models) and scene measurements in the form of depth or RGB-D images. In particular, exploitation of color and depth cues from RGB-D images through suitable integration of texture-based and 3D features can yield quite remarkable performance across a variety of benchmark RGB-D datasets [1,24]. Although providing compelling results on standard benchmarks, the above mentioned multi-stage, multi-modal, feature-based pipelines turn out unsuited Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-49409-8 54) contains supplementary material, which is available to authorized users. c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part III, LNCS 9915, pp. 634–647, 2016. DOI: 10.1007/978-3-319-49409-8 54
RobotFusion: Grasping with a Robotic Manipulator
635
to practical real-time industrial applications due to exceedingly slow execution times. Furthermore, due to reliance on a single vantage point, these approaches may fail when the sought objects are captured under high levels of occlusion [2] and/or their shape and texture do not appear distinctive enough in the chosen view. Finally, an additional nuisance that may hinder the performance of such approaches is represented by the high sensor noise affecting the depth frame provided as as input to the algorithms, which tends to distort significantly the 3D surfaces and often cause holes and artifacts [13]. In this paper, we investigate on th
Data Loading...