Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning

  • PDF / 7,924,244 Bytes
  • 28 Pages / 439.642 x 666.49 pts Page_size
  • 106 Downloads / 183 Views

DOWNLOAD

REPORT


Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning Sara Ershadi-Nasab1 · Shohreh Kasaei2

· Esmaeil Sanaei1

Received: 7 November 2019 / Revised: 10 August 2020 / Accepted: 26 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-occluded body parts of some persons might be visible in other camera views. But, when cameras are moving and uncalibrated, estimating the association of multiple human body parts among different camera views is a challenging task. This paper presents novel methods for multiple human 3D pose estimation and pose association in multi-view camera frames in an uncalibrated camera setup using an adversarial learning framework. The generator is a 3D pose estimation network that learns a mapping of distance and angular difference matrices between 2D and 3D spaces. The discriminator tries to distinguish the predicted 3D poses from the groundtruth, which helps to enforce the pose estimator to generate valid 3D poses. To increase the accuracy of the generator network, multi-view frames are used. The estimated 3D poses are associated among multi-view frames by a statistical method. The association and relative rotation and translation of cameras to each other are also obtained. This step empowers the generator network and removes ambiguities in the estimation of occluded or self-occluded body parts. The global 3D poses are the inputs to the discriminator network to imposter the discriminator that they come from the ground-truth. Experimental results conducted on multi-view and multi-person datasets (such as Campus, Shelf, Utrecht Multi-Person Motion (UMPM), and also KTH Football 2) indicate that the proposed method achieves superior performance in comparison with other state-of-the-art methods while it does require any calibration information in priori. Keywords 3D pose estimation · Multi-view · Human associations · Uncalibrated cameras · Generative adversarial

1 Introduction For better readability of the paper, the abbreviation list is provided in Table 1.  Shohreh Kasaei

[email protected]

Extended author information available on the last page of the article.

Multimedia Tools and Applications Table 1 Abreviation list Full form

Acronyms

Procrustes analysis

PA

Generative adversarial network

GAN

Euclidean distance matrix

EDM

Angular difference matrix

ADM

Convolutional neural network

CNN

3D Pictorial structure

3DPS

Ctructured support vector machine

SSVM

Expectation-Maximization

EM

Utrecht Multi-Person motion

UMPM

Singular value decomposition

SVD

Percentage of correct estimated parts

PCP

Batch normalization

BN

Rectified linear unit

ReLU

3D human pose estimation suffers from the difficulty of gathering 3D ground-truth. While gathering large-scale 2D