Three-Dimensional Object Recognition

Some applications require a position estimate in 3D space (and not just in the 2D image plane), e.g., bin picking applications, where individual objects have to be gripped by a robot from an unordered set of objects. Typically, such applications utilize s

  • PDF / 815,952 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 77 Downloads / 206 Views

DOWNLOAD

REPORT


Three-Dimensional Object Recognition

Abstract Some applications require a position estimate in 3D space (and not just in the 2D image plane), e.g., bin picking applications, where individual objects have to be gripped by a robot from an unordered set of objects. Typically, such applications utilize sensor systems which allow for the generation of 3D data and perform matching in 3D space. Another way to determine the 3D pose of an object is to estimate the projection of the object location in 3D space onto a 2D camera image. There exist methods managing to get by with just a single 2D camera image for the estimation of this 3D → 2D mapping transformation. Some of them shall be presented in this chapter. They are also examples of correspondence-based schemes, as the matching step is performed by establishing correspondences between scene image and model features. However, instead of using just single scene image and model features, correspondences between special configurations of multiple features are established here. First of all, the SCERPO system makes use of feature groupings which are perceived similar from a wide variety of viewpoints. Another method, called relational indexing, uses hash tables to speed up the search. Finally, a system called LEWIS derives so-called invariants from specific feature configurations, which are designed such that their topologies remain stable for differing viewpoints.

5.1 Overview Before presenting the methods, let’s define what is meant by “3D object recognition” here. The methods presented up to now perform matching of a 2D model to the 2D camera image plane, i.e., the estimated transformation between model and scene image describes a mapping from 2D to 2D. Of course, this is a simplification of reality where the objects to be recognized are located in a 3D coordinate system (often called world coordinates) and are projected onto a 2D image plane. Some of the methods intend to achieve invariance with respect to out-of-plane rotations in 3D space, e.g., by assuming that the objects to be found are nearly planar. In that case, a change of the object pose can be modeled by a 2D affine transformation. However, the mapping still is from 2D to 2D. M. Treiber, An Introduction to Object Recognition, Advances in Pattern Recognition, C Springer-Verlag London Limited 2010 DOI 10.1007/978-1-84996-235-3_5, 

95

96

5

3D Object Recognition

In contrast to that, 3D matching describes the mapping of 3D positions to 3D positions again. In order to obtain a 3D representation of a scene, well-known methods such as triangulation or binocular stereo can be applied. Please note that many of the methods utilize so-called range images or depth maps, where information   about the z-direction (e.g., z-distance to the sensor) is stored dependent on the x, y position in the image plane. Such a data representation is not “full” 3D yet and therefore is often called 21/2 D. Another way to determine the 3D pose of an object is to estimate the projection of the object location in 3D space onto the 2D