A generalizable approach for multi-view 3D human pose regression
- PDF / 2,001,852 Bytes
- 14 Pages / 595.276 x 790.866 pts Page_size
- 6 Downloads / 291 Views
ORIGINAL PAPER
A generalizable approach for multi-view 3D human pose regression Abdolrahim Kadkhodamohammadi1
· Nicolas Padoy1
Received: 13 March 2019 / Revised: 8 April 2020 / Accepted: 24 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Despite the significant improvement in the performance of monocular pose estimation approaches and their ability to generalize to unseen environments, multi-view approaches are often lagging behind in terms of accuracy and are specific to certain datasets. This is mainly due to the fact that (1) contrary to real-world single-view datasets, multi-view datasets are often captured in controlled environments to collect precise 3D annotations, which do not cover all real-world challenges, and (2) the model parameters are learned for specific camera setups. To alleviate these problems, we propose a two-stage approach to detect and estimate 3D human poses, which separates single-view pose detection from multi-view 3D pose estimation. This separation enables us to utilize each dataset for the right task, i.e. single-view datasets for constructing robust pose detection models and multi-view datasets for constructing precise multi-view 3D regression models. In addition, our 3D regression approach only requires 3D pose data and its projections to the views for building the model, hence removing the need for collecting annotated data from the test setup. Our approach can therefore be easily generalized to a new environment by simply projecting 3D poses into 2D during training according to the camera setup used at test time. As 2D poses are collected at test time using a single-view pose detector, which might generate inaccurate detections, we model its characteristics and incorporate this information during training. We demonstrate that incorporating the detector’s characteristics is important to build a robust 3D regression model and that the resulting regression model generalizes well to new multi-view environments. Our evaluation results show that our approach achieves competitive results on the Human3.6M dataset and significantly improves results on a multi-view clinical dataset that is the first multi-view dataset generated from live surgery recordings. Keywords Multi-view human pose estimation · 3D pose regression · Neural networks · Generalizability
1 Introduction Single-view human detection and body pose estimation have enjoyed a great deal of attention over the last decades in the field of computer vision because of their importance for various applications, ranging from activity recognition to human computer interaction. More recently, the emergence of deep learning has pushed the boundaries in many fields, including computer vision. The combination of deep learning with the availability of large datasets, such as MPII Pose [4] and MS COCO [27], has spawned many promising approaches for single-view human detection and pose estimation [10,35,50]. But the presence of clutter and occlusions degrades their per-
B
Abdolrahim Kadkhodamohammadi kadkh
Data Loading...