Hand Pose Estimation from Local Surface Normals
We present a hierarchical regression framework for estimating hand joint positions from single depth images based on local surface normals. The hierarchical regression follows the tree structured topology of hand from wrist to finger tips. We propose a co
- PDF / 1,073,416 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 19 Downloads / 232 Views
Computer Vision Laboratory, D-ITET, ETH Zurich, Z¨ urich, Switzerland {wanc,vangool}@vision.ee.ethz.ch 2 Department of Computer Science, University of Bonn, Bonn, Germany [email protected] 3 VISICS, ESAT, K.U. Leuven, Leuven, Belgium
Abstract. We present a hierarchical regression framework for estimating hand joint positions from single depth images based on local surface normals. The hierarchical regression follows the tree structured topology of hand from wrist to finger tips. We propose a conditional regression forest, i.e. the Frame Conditioned Regression Forest (FCRF) which uses a new normal difference feature. At each stage of the regression, the frame of reference is established from either the local surface normal or previously estimated hand joints. By making the regression with respect to the local frame, the pose estimation is more robust to rigid transformations. We also introduce a new efficient approximation to estimate surface normals. We verify the effectiveness of our method by conducting experiments on two challenging real-world datasets and show consistent improvements over previous discriminative pose estimation methods.
1
Introduction
We consider the problem of 3D hand pose estimation from single depth images. Hand pose estimation has important applications in human-computer interaction (HCI) and augmented reality (AR). Estimating the freely moving hand has several challenges including large viewpoint variance, finger similarity and self occlusion and versatile and rapid finger articulation. Methods for hand pose estimation from depth generally fall into two camps. The first is frame-to-frame model based tracking [1–5]. Model-based tracking approaches can be highly accurate if given enough computational resources for the optimization. The second camp, where our work also falls, is single frame discriminative pose estimation [6–9]. These methods are less accurate than modelbased trackers but much faster and are targeted towards real-time performance without GPUs. Model-based tracking and discriminative pose estimation are complementary to each other and there have been notable hybrid methods [10– 14] which try to maintain the advantages of both camps. Earlier methods for discriminative hand pose estimation tried to estimate all joints directly [15,16] though such approaches tend to fail with dramatic viewpoint changes and extreme articulations. Following the lead of several notable c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 554–569, 2016. DOI: 10.1007/978-3-319-46487-9 34
Hand Pose Estimation from Local Surface Normals
555
regression
Normal Estimation z
z
x
x
x
frame estimation
y y
z x
z Wrist Estimation
MCP Estimation
Palm Estimation
TIP DIP
in-plane rotation
PIP MCP Wrist
y
y z
x
x
y z x Palm Frame
y
y z
x
x
PIP Estimation
(a)
DIP Estimation
Finger Estimation
(b)
Fig. 1. Framework. (a) Shows the hand skeleton model used in our work. (b) Sketches our hierarchical regression framework, with each
Data Loading...