Visual Analysis of Humans Looking at People

Understanding human activity from video is one of the central problems in the field of computer vision, driven by a wide variety of applications in communications, entertainment, security, commerce, and athletics.This unique text/reference provides a cohe

  • PDF / 1,055,218 Bytes
  • 25 Pages / 439.37 x 666.142 pts Page_size
  • 100 Downloads / 182 Views

DOWNLOAD

REPORT


Part-Based Models for Finding People and Estimating Their Pose Deva Ramanan

Abstract This chapter will survey approaches to person detection and pose estimation with the use of part-based models. After a brief introduction/motivation for the need for parts, the bulk of the chapter will be split into three core sections on Representation, Inference, and Learning. We begin by describing various gradient-based and color descriptors for parts. We next focus on representations for encoding structural relations between parts, describing extensions of classic pictorial structures models to capture occlusion and appearance relations. We will use the formalism of probabilistic models to unify such representations and introduce the issues of inference and learning. We describe various efficient algorithms designed for treestructured models, as well as focusing on discriminative formalisms for learning model parameters. We finally end with applications of pedestrian detection, human pose estimation, and people tracking.

11.1 Introduction Part models date back to the generalized cylinder models of Binford [3] and Marr and Nishihara [40] and the pictorial structures of Fischler and Elschlager [24] and Felzenszwalb and Huttenlocher [18]. The basic premise is that objects can be modeled as a collection of local templates that deform and articulate with respect to one another (Fig. 11.1). Contemporary work: Part-based models have appeared in recent history under various formalisms. Felzenszwalb and Huttenlocher [18] directly use the pictorial structure moniker, but also notably develop efficient inference algorithms for matching them to images. Constellation models [7, 20, 64] take the same approach, but use a sparse set of parts defined at keypoint locations. Body plans [25] are another representation that encodes particular geometric rules for defining valid deformations of local templates. D. Ramanan () Department of Computer Science, University of California, Irvine, USA e-mail: [email protected] T.B. Moeslund et al. (eds.), Visual Analysis of Humans, DOI 10.1007/978-0-85729-997-0_11, © Springer-Verlag London Limited 2011

199

200

D. Ramanan

Star models: A particularly common form of geometric constraint is known as a “star model”, which states that part placements are independent within some root coordinate frame. Visually speaking, one can think of springs connecting each part to some root bounding box. This geometric model can be implicitly encoded in an implicit shape model [38]. One advantage of the implicit encoding is that one can typically deal with a large vocabulary of parts, sometimes known as a codebook of visual words [57]. Oftentimes such codebooks are generated by clustering candidate patches typically found in images of people. Poselets [4] are recent successful extension of such a model, where part models are trained discriminatively using fully supervised data, eliminating the need for codebook generation through clustering. K-fan models generalize star models [9] by modeling part placements as inde