3D hand mesh reconstruction from a monocular RGB image

  • PDF / 5,057,817 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 72 Downloads / 269 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

3D hand mesh reconstruction from a monocular RGB image Hao Peng1

· Chuhua Xian1

· Yunbo Zhang2

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Most of the existing methods for 3D hand analysis based on RGB images mainly focus on estimating hand keypoints or poses, which cannot capture geometric details of the 3D hand shape. In this work, we propose a novel method to reconstruct a 3D hand mesh from a single monocular RGB image. Different from current parameter-based or pose-based methods, our proposed method directly estimates the 3D hand mesh based on graph convolution neural network (GCN). Our network consists of two modules: the hand localization and mask generation module, and the 3D hand mesh reconstruction module. The first module, which is a VGG16-based network, is applied to localize the hand region in the input image and generate the binary mask of the hand. The second module takes the high-order features from the first and uses a GCN-based network to estimate the coordinates of each vertex of the hand mesh and reconstruct the 3D hand shape. To achieve better accuracy, a novel loss based on the differential properties of the discrete mesh is proposed. We also use professional software to create a large synthetic dataset that contains both ground truth 3D hand meshes and poses for training. To handle the real-world data, we use the CycleGAN network to transform the data domain of real-world images to that of our synthesis dataset. We demonstrate that our method can produce accurate 3D hand mesh and achieve an efficient performance for real-time applications. Keywords Image-based modeling · 3D hand mesh reconstruction · Hand dataset · Hand pose estimation

1 Introduction Hand shapes and poses play an important role in many applications, such as virtual reality(VR), augmented reality(AR), mixed reality(MR), and human–computer interactions(HCI) [14,29]. Nowadays, there have been many methods of hand pose estimation [9,26] or hand gesture recognition [23]. However, modeling a 3D hand shape based on vision is still an open problem because of complexity, gesture, occlusions of fingers, etc. Generally, the methods of 3D hand modeling mainly include three categories: manual modeling using 3D modeling software, scanning-based reconstruction, and image-

B

Chuhua Xian [email protected] Hao Peng [email protected] Yunbo Zhang [email protected]

1

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

2

Department of Industrial and Systems Engineering, Rochester Institute of Technology, New York, USA

based modeling. Manual modeling requires using 3D modeling software, such as Maya, 3DS Max, or Blender, to model the hand shape. This method is quite tedious, time-consuming and requires professional skills. Scanning-based reconstruction methods utilize 3D scanner to obtain the point cloud and then get the 3D shape by mesh reconstruct algorithms, which suffer from the noisy and incomplete point cloud data because of the li