Monocular 3D Tracking of Articulated Human Motion in Silhouette and Pose Manifolds

  • PDF / 6,736,947 Bytes
  • 19 Pages / 600.05 x 792 pts Page_size
  • 12 Downloads / 185 Views

DOWNLOAD

REPORT


Research Article Monocular 3D Tracking of Articulated Human Motion in Silhouette and Pose Manifolds Feng Guo1 and Gang Qian1, 2 1 Department

of Electrical Engineering, Arizona State University, Tempe, AZ 85287-9309, USA Media and Engineering Program, Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-8709, USA

2 Arts,

Correspondence should be addressed to Gang Qian, [email protected] Received 1 February 2007; Revised 24 July 2007; Accepted 29 January 2008 Recommended by Nikos Nikolaidis This paper presents a robust computational framework for monocular 3D tracking of human movement. The main innovation of the proposed framework is to explore the underlying data structures of the body silhouette and pose spaces by constructing lowdimensional silhouettes and poses manifolds, establishing intermanifold mappings, and performing tracking in such manifolds using a particle filter. In addition, a novel vectorized silhouette descriptor is introduced to achieve low-dimensional, noise-resilient silhouette representation. The proposed articulated motion tracker is view-independent, self-initializing, and capable of maintaining multiple kinematic trajectories. By using the learned mapping from the silhouette manifold to the pose manifold, particle sampling is informed by the current image observation, resulting in improved sample efficiency. Decent tracking results have been obtained using synthetic and real videos. Copyright © 2008 F. Guo and G. Qian. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

Reliable recovery and tracking of articulated human motion from video are considered a very challenging problem in computer vision, due to the versatility of human movement, the variability of body types, various movement styles and signatures, and the 3D nature of human body. Visionbased tracking of articulated motion is a temporal inference problem. There exist numerous computational frameworks addressing this problem. Some of the frameworks make use of training data (e.g., [1]) to inform the tracking, while some attempt to directly infer the articulated motion without using any training data (e.g., [2]). When training data is available, the articulated motion tracking can be cast into a statistical learning and inference problem. Using a set of training examples, a learning and inference framework needs to be developed to track both seen and unseen movements performed by known or unknown subjects. In terms of the learning and inference structure, existing 3D tracking algorithms can be roughly clustered into two categories, namely, generative-based and discriminative-based approaches. Generative-based approaches, for example [2–4], usually assume the knowledge of a 3D body model of the sub-

ject and dynamical models of the related movement, from which kinematic predictions and corresponding image observations can be