Towards Viewpoint Invariant 3D Human Pose Estimation

We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, ou

PDF / 5,067,014 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
48 Downloads / 337 Views

DOWNLOAD

REPORT

Abstract. We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, our model is able to selectively predict partial poses in the presence of noise and occlusion. Our approach leverages a convolutional and recurrent network architecture with a top-down error feedback mechanism to self-correct previous pose estimates in an end-to-end manner. We evaluate our model on a previously published depth dataset and a newly collected human pose dataset containing 100 K annotated depth images from extreme viewpoints. Experiments show that our model achieves competitive performance on frontal views while achieving state-of-the-art performance on alternate viewpoints.

1

Introduction

Depth sensors are becoming ubiquitous in applications ranging from security to robotics and from entertainment to smart spaces [5]. While recent advances in pose estimation have improved performance on front and side views, most realworld settings present challenging viewpoints such as top or angled views in retail stores, hospital environments, or airport settings. These viewpoints introduce high levels of self-occlusion making human pose estimation diﬃcult for existing algorithms. Humans are remarkably robust at predicting full rigid-body and articulated poses in these challenging scenarios. However, most work in the human pose estimation literature has addressed relatively constrained settings. There has been a long line of work on generative pose models, where a pose is estimated by constructing a skeleton using templates or priors in a top-down manner [12,16,18,19]. In contrast, discriminative methods directly identify individual body parts, labels, or positions and construct the skeleton in a bottom-up approach [14,15,51,52,54]. However, recent research in both classes primarily focus Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46448-0 10) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 160–177, 2016. DOI: 10.1007/978-3-319-46448-0 10

Towards Viewpoint Invariant 3D Human Pose Estimation

161

on frontal views with few occlusions despite the abundance of occlusion and partial-pose research in object detection [2–4,7,9,22,23,32,53,61]. Even modern representation learning techniques address human pose estimation from frontal or side views [10,17,34,41,42,59,60]. While the above methods improve human pose estimation, they fail to address viewpoint variances. In this work we address the problem of viewpoint invariant pose estimation from single depth images. There are two challenges towards this goal. The ﬁrst challenge is designing a model that is not only rich enough to reason about 3D spatial information but also robust to viewpoint changes. The model must understand both loc

Data Loading...

Towards Viewpoint Invariant 3D Human Pose Estimation

Recommend Documents

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

3D Human Pose Estimation with 2D Human Pose and Depthmap

Learning Markerless Human Pose Estimation from Multiple Viewpoint Video

3D Pose Estimation

3D Human Body Shape and Pose Estimation from Depth Image

Human Pose Estimation in Space and Time Using 3D CNN

Towards Part-Aware Monocular 3D Human Pose Estimation: An Architecture Search Approach

View-Invariant Probabilistic Embedding for Human Pose

Motion Guided 3D Pose Estimation from Videos

Bayesian Image Based 3D Pose Estimation

Enhancing feature fusion for human pose estimation

Human Pose Estimation Using Deep Consensus Voting