3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional c

  • PDF / 7,349,741 Bytes
  • 17 Pages / 595 x 791 pts Page_size
  • 90 Downloads / 164 Views

DOWNLOAD

REPORT


IPSJ Transactions on Computer Vision and Applications

RESEARCH PA PER

Open Access

3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional camera Teppei Miura*

and Shinji Sako

Abstract We address a 3D human pose estimation for equirectangular images taken by a wearable omnidirectional camera. The equirectangular image is distorted because the omnidirectional camera is attached closely in front of a person’s neck. Furthermore, some parts of the body are disconnected on the image; for instance, when a hand goes out to an edge of the image, the hand comes in from another edge. The distortion and disconnection of images make 3D pose estimation challenging. To overcome this difficulty, we introduce the location-maps method proposed by Mehta et al.; however, the method was used to estimate 3D human poses only for regular images without distortion and disconnection. We focus on a characteristic of the location-maps that can extend 2D joint locations to 3D positions with respect to 2D-3D consistency without considering kinematic model restrictions and optical properties. In addition, we collect a new dataset that is composed of equirectangular images and synchronized 3D joint positions for training and evaluation. We validate the location-maps’ capability to estimate 3D human poses for distorted and disconnected images. We propose a new location-maps-based model by replacing the backbone network with a state-of-the-art 2D human pose estimation model (HRNet). Our model is a simpler architecture than the reference model proposed by Mehta et al. Nevertheless, our model indicates better performance with respect to accuracy and computation complexity. Finally, we analyze the location-maps method from two perspectives: the map variance and the map scale. Therefore, some location-maps characteristics are revealed that (1) the map variance affects robustness to extend 2D joint locations to 3D positions for the 2D estimation error, and (2) the 3D position accuracy is related to the 2D locations relative accuracy to the map scale. Keywords: 3D pose estimation, location-maps, Omnidirectional camera, Equirectangular image, Distortion, Disconnection

1 Introduction Human pose motion capture is widely used in some applications, for example, computer graphics for movies and games, sports science, and sign language recognition. For this purpose, easy and low-cost methods are needed to capture the human pose motion. One of the main methods *Correspondence: [email protected] Department of Computer Science, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, 466-8555, Nagoya, Japan

is human pose estimation. In recent years, human pose estimation has been actively researched, and deep neural network (DNN) has achieved considerable attention. In human pose estimation research, RGB or RGB-D cameras are commonly used for input devices that take videos, images, or depth data. The input data are typically taken from the second-person perspective, and the data include approxi