3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional c

PDF / 7,349,741 Bytes
17 Pages / 595 x 791 pts Page_size
90 Downloads / 239 Views

IPSJ Transactions on Computer Vision and Applications

RESEARCH PA PER

Open Access

3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional camera Teppei Miura*

and Shinji Sako

Abstract We address a 3D human pose estimation for equirectangular images taken by a wearable omnidirectional camera. The equirectangular image is distorted because the omnidirectional camera is attached closely in front of a person’s neck. Furthermore, some parts of the body are disconnected on the image; for instance, when a hand goes out to an edge of the image, the hand comes in from another edge. The distortion and disconnection of images make 3D pose estimation challenging. To overcome this difficulty, we introduce the location-maps method proposed by Mehta et al.; however, the method was used to estimate 3D human poses only for regular images without distortion and disconnection. We focus on a characteristic of the location-maps that can extend 2D joint locations to 3D positions with respect to 2D-3D consistency without considering kinematic model restrictions and optical properties. In addition, we collect a new dataset that is composed of equirectangular images and synchronized 3D joint positions for training and evaluation. We validate the location-maps’ capability to estimate 3D human poses for distorted and disconnected images. We propose a new location-maps-based model by replacing the backbone network with a state-of-the-art 2D human pose estimation model (HRNet). Our model is a simpler architecture than the reference model proposed by Mehta et al. Nevertheless, our model indicates better performance with respect to accuracy and computation complexity. Finally, we analyze the location-maps method from two perspectives: the map variance and the map scale. Therefore, some location-maps characteristics are revealed that (1) the map variance affects robustness to extend 2D joint locations to 3D positions for the 2D estimation error, and (2) the 3D position accuracy is related to the 2D locations relative accuracy to the map scale. Keywords: 3D pose estimation, location-maps, Omnidirectional camera, Equirectangular image, Distortion, Disconnection

1 Introduction Human pose motion capture is widely used in some applications, for example, computer graphics for movies and games, sports science, and sign language recognition. For this purpose, easy and low-cost methods are needed to capture the human pose motion. One of the main methods *Correspondence: [email protected] Department of Computer Science, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, 466-8555, Nagoya, Japan

is human pose estimation. In recent years, human pose estimation has been actively researched, and deep neural network (DNN) has achieved considerable attention. In human pose estimation research, RGB or RGB-D cameras are commonly used for input devices that take videos, images, or depth data. The input data are typically taken from the second-person perspective, and the data include approxi

Data Loading...

3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional c

Recommend Documents

3D Human Pose Estimation with 2D Human Pose and Depthmap

Human Pose Estimation in Space and Time Using 3D CNN

Towards Viewpoint Invariant 3D Human Pose Estimation

3D Pose Estimation

3D Human Body Shape and Pose Estimation from Depth Image

Human Pose Estimation Using Deep Consensus Voting

Multi-level Prediction with Graphical Model for Human Pose Estimation

Enhancing feature fusion for human pose estimation

GHand: A Graph Convolution Network for 3D Hand Pose Estimation

Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose

A Latent Clothing Attribute Approach for Human Pose Estimation

Motion Guided 3D Pose Estimation from Videos