Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning
- PDF / 7,924,244 Bytes
- 28 Pages / 439.642 x 666.49 pts Page_size
- 106 Downloads / 183 Views
Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning Sara Ershadi-Nasab1 · Shohreh Kasaei2
· Esmaeil Sanaei1
Received: 7 November 2019 / Revised: 10 August 2020 / Accepted: 26 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-occluded body parts of some persons might be visible in other camera views. But, when cameras are moving and uncalibrated, estimating the association of multiple human body parts among different camera views is a challenging task. This paper presents novel methods for multiple human 3D pose estimation and pose association in multi-view camera frames in an uncalibrated camera setup using an adversarial learning framework. The generator is a 3D pose estimation network that learns a mapping of distance and angular difference matrices between 2D and 3D spaces. The discriminator tries to distinguish the predicted 3D poses from the groundtruth, which helps to enforce the pose estimator to generate valid 3D poses. To increase the accuracy of the generator network, multi-view frames are used. The estimated 3D poses are associated among multi-view frames by a statistical method. The association and relative rotation and translation of cameras to each other are also obtained. This step empowers the generator network and removes ambiguities in the estimation of occluded or self-occluded body parts. The global 3D poses are the inputs to the discriminator network to imposter the discriminator that they come from the ground-truth. Experimental results conducted on multi-view and multi-person datasets (such as Campus, Shelf, Utrecht Multi-Person Motion (UMPM), and also KTH Football 2) indicate that the proposed method achieves superior performance in comparison with other state-of-the-art methods while it does require any calibration information in priori. Keywords 3D pose estimation · Multi-view · Human associations · Uncalibrated cameras · Generative adversarial
1 Introduction For better readability of the paper, the abbreviation list is provided in Table 1. Shohreh Kasaei
[email protected]
Extended author information available on the last page of the article.
Multimedia Tools and Applications Table 1 Abreviation list Full form
Acronyms
Procrustes analysis
PA
Generative adversarial network
GAN
Euclidean distance matrix
EDM
Angular difference matrix
ADM
Convolutional neural network
CNN
3D Pictorial structure
3DPS
Ctructured support vector machine
SSVM
Expectation-Maximization
EM
Utrecht Multi-Person motion
UMPM
Singular value decomposition
SVD
Percentage of correct estimated parts
PCP
Batch normalization
BN
Rectified linear unit
ReLU
3D human pose estimation suffers from the difficulty of gathering 3D ground-truth. While gathering large-scale 2D
Data Loading...