SDM3d : shape decomposition of multiple geometric priors for 3D pose estimation

  • PDF / 1,927,608 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 62 Downloads / 146 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

SDM3D: shape decomposition of multiple geometric priors for 3D pose estimation Mengxi Jiang1 • Zhuliang Yu2 • Cuihua Li1 • Yunqi Lei1 Received: 12 November 2019 / Accepted: 4 June 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Recovering the 3D human pose from a single image with 2D joints is a challenging task in computer vision applications. The sparse representation (SR) model has been successfully adopted in 3D pose estimation approaches. However, since existing available training 3D data are often collected in a constrained environment (i.e., indoor) with limited diversity of subjects and actions, most SR-based approaches would have a lower generalization to real-world scenarios that may contain more complex cases. To alleviate this issue, this paper proposes SDM3D, a novel shape decomposition using multiple geometric priors for 3D pose estimation. SDM3D makes a new attempt by separating a 3D pose into the global structure and body deformations that are encoded explicitly via different priors constraints. Furthermore, a joint learning strategy is designed to learn two over-complete dictionaries from training data to capture more geometric priors information. We have evaluated SDM3D on four well-recognized benchmarks, i.e., Human3.6M, HumanEva-I, CMU MoCap, and MPII. The experiment results show the effectiveness of SDM3D. Keywords 3D pose estimation  Sparse representation model  Shape decomposition model  Multiple geometric learning

1 Introduction In many computer vision applications, such as human– robot interaction [16], virtual reality (VR) [18], and video surveillance [32], it is crucial to analyze and reconstruct the 3D object [1, 9, 28, 42, 59]. Especially, a fundamental task is to recover the 3D human pose from a single 2D image containing the human activity [42]. Typically, for 2D body joints given/detected from an image, the 3D human pose is estimated by learning a mapping from these 2D & Yunqi Lei [email protected] Mengxi Jiang [email protected] Zhuliang Yu [email protected] Cuihua Li [email protected] 1

Department of Computer Science, Xiamen University, Xiamen 361005, China

2

College of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China

observations to 3D space [4, 20, 29, 37, 43, 46], or designing a model-based reasoning procedure, e.g., linear models [40, 53, 68] and sophisticated dimensionality reduction methods [4, 24, 49]. Due to the advantage of capturing complex pose variability, the sparse representation (SR)-based approaches have shown their effectiveness in recovering 3D poses [17, 21, 40, 66, 68]. These approaches estimate a 3D human pose as a whole by fitting its projections to given/ detected 2D landmarks, in which the 3D pose is represented as a linear combination of a set of 3D geometry bases learned from existing motion capture (MoCap) datasets [12, 19, 45]. Under the assumption of sufficient training classes [62], the sparsity constra