3D hypothesis clustering for cross-view matching in multi-person motion capture

  • PDF / 4,952,250 Bytes
  • 10 Pages / 612 x 808 pts Page_size
  • 84 Downloads / 165 Views

DOWNLOAD

REPORT


Vol. 6, No. 2, June 2020, 147–156

Research Article

3D hypothesis clustering for cross-view matching in multiperson motion capture Miaopeng Li1 , Zimeng Zhou1 , and Xinguo Liu1 ( ) c The Author(s) 2020. 

capture is feasible for a single person in weakly controlled environments, but is very difficult for a group of people in uncontrolled environments, due to the increased complexity in occlusion, appearance, motion, shape, and scale. In this paper, we focus on markerless motion capture for multiple people with a multiview setup. Past approaches typically solve this problem in two stages. The first stage detects 2D body keypoints or pose in each view for all persons, and the second stage matches them across views to reconstruct 3D poses. As deep-learning based keypoint and pose detection techniques have greatly advanced [1–6], the remaining challenge is to resolve the correspondences between detected keypoints or poses across different views and different persons. Most previous methods employ a 3D pictorial structure (3DPS) model to implicitly solve the correspondence problem by reasoning about all hypotheses in 3D that are geometrically compatible with the detected 2D Keywords multi-person motion capture; cross-view information [7–11]. However, 3DPS-based approaches matching; clustering; human pose estimation are computationally expensive due to the huge state space. In addition, they are not robust, especially 1 Introduction when there are few cameras, as they link the 2D Multi-person motion capture estimates the detected joints only based on multiview geometry, articulated joint positions and/or angles for a and appearance cues are ignored. This paper presents a 3D hypothesis clustering group of people from video. It is an important yet challenging task with many applications, such as technique to efficiently and robustly determine the human–computer interaction, action recognition, cross-view correspondences between the detected emotion analysis, human performance analysis, and joints. The proposed technique transforms the so on. The latest work shows that markerless motion correspondence problem from 2D space to 3D, and solves it by a 3D hypothesis clustering algorithm 1 State Key Lab of CAD&CG, Zhejiang University, Hangzhou incorporating appearance evidence, multiview 310058, China. E-mail: M. Li, li [email protected]; Z. Each Zhou, [email protected]; X. Liu, [email protected] ( ). geometry, and bone length information. resulting cluster is a set of 3D points, which Manuscript received: 2020-03-23; accepted: 2020-03-28

Abstract We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2D joints in the presence of noise. We propose a 3D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2D space into a clustering problem in a 3D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to s