Hand Pose Estimation from Local Surface Normals

We present a hierarchical regression framework for estimating hand joint positions from single depth images based on local surface normals. The hierarchical regression follows the tree structured topology of hand from wrist to finger tips. We propose a co

PDF / 1,073,416 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
19 Downloads / 232 Views

DOWNLOAD

REPORT

Computer Vision Laboratory, D-ITET, ETH Zurich, Z¨ urich, Switzerland {wanc,vangool}@vision.ee.ethz.ch 2 Department of Computer Science, University of Bonn, Bonn, Germany [email protected] 3 VISICS, ESAT, K.U. Leuven, Leuven, Belgium

Abstract. We present a hierarchical regression framework for estimating hand joint positions from single depth images based on local surface normals. The hierarchical regression follows the tree structured topology of hand from wrist to ﬁnger tips. We propose a conditional regression forest, i.e. the Frame Conditioned Regression Forest (FCRF) which uses a new normal diﬀerence feature. At each stage of the regression, the frame of reference is established from either the local surface normal or previously estimated hand joints. By making the regression with respect to the local frame, the pose estimation is more robust to rigid transformations. We also introduce a new eﬃcient approximation to estimate surface normals. We verify the eﬀectiveness of our method by conducting experiments on two challenging real-world datasets and show consistent improvements over previous discriminative pose estimation methods.

1

Introduction

We consider the problem of 3D hand pose estimation from single depth images. Hand pose estimation has important applications in human-computer interaction (HCI) and augmented reality (AR). Estimating the freely moving hand has several challenges including large viewpoint variance, ﬁnger similarity and self occlusion and versatile and rapid ﬁnger articulation. Methods for hand pose estimation from depth generally fall into two camps. The ﬁrst is frame-to-frame model based tracking [1–5]. Model-based tracking approaches can be highly accurate if given enough computational resources for the optimization. The second camp, where our work also falls, is single frame discriminative pose estimation [6–9]. These methods are less accurate than modelbased trackers but much faster and are targeted towards real-time performance without GPUs. Model-based tracking and discriminative pose estimation are complementary to each other and there have been notable hybrid methods [10– 14] which try to maintain the advantages of both camps. Earlier methods for discriminative hand pose estimation tried to estimate all joints directly [15,16] though such approaches tend to fail with dramatic viewpoint changes and extreme articulations. Following the lead of several notable c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 554–569, 2016. DOI: 10.1007/978-3-319-46487-9 34

Hand Pose Estimation from Local Surface Normals

555

regression

Normal Estimation z

z

x

x

x

frame estimation

y y

z x

z Wrist Estimation

MCP Estimation

Palm Estimation

TIP DIP

in-plane rotation

PIP MCP Wrist

y

y z

x

x

y z x Palm Frame

y

y z

x

x

PIP Estimation

(a)

DIP Estimation

Finger Estimation

(b)

Fig. 1. Framework. (a) Shows the hand skeleton model used in our work. (b) Sketches our hierarchical regression framework, with each

Data Loading...

Hand Pose Estimation from Local Surface Normals

Recommend Documents

GHand: A Graph Convolution Network for 3D Hand Pose Estimation

SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation

Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Motion Guided 3D Pose Estimation from Videos

Multi-person Pose Estimation with Local Joint-to-Person Associations

MobileHand: Real-Time 3D Hand Shape and Pose Estimation from Color Image

Learning Delicate Local Representations for Multi-person Pose Estimation

3D Pose Estimation

Face Pose Estimation

2D Body Pose Estimation