3D hand mesh reconstruction from a monocular RGB image

PDF / 5,057,817 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
72 Downloads / 282 Views

ORIGINAL ARTICLE

3D hand mesh reconstruction from a monocular RGB image Hao Peng1

· Chuhua Xian1

· Yunbo Zhang2

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Most of the existing methods for 3D hand analysis based on RGB images mainly focus on estimating hand keypoints or poses, which cannot capture geometric details of the 3D hand shape. In this work, we propose a novel method to reconstruct a 3D hand mesh from a single monocular RGB image. Different from current parameter-based or pose-based methods, our proposed method directly estimates the 3D hand mesh based on graph convolution neural network (GCN). Our network consists of two modules: the hand localization and mask generation module, and the 3D hand mesh reconstruction module. The first module, which is a VGG16-based network, is applied to localize the hand region in the input image and generate the binary mask of the hand. The second module takes the high-order features from the first and uses a GCN-based network to estimate the coordinates of each vertex of the hand mesh and reconstruct the 3D hand shape. To achieve better accuracy, a novel loss based on the differential properties of the discrete mesh is proposed. We also use professional software to create a large synthetic dataset that contains both ground truth 3D hand meshes and poses for training. To handle the real-world data, we use the CycleGAN network to transform the data domain of real-world images to that of our synthesis dataset. We demonstrate that our method can produce accurate 3D hand mesh and achieve an efficient performance for real-time applications. Keywords Image-based modeling · 3D hand mesh reconstruction · Hand dataset · Hand pose estimation

1 Introduction Hand shapes and poses play an important role in many applications, such as virtual reality(VR), augmented reality(AR), mixed reality(MR), and human–computer interactions(HCI) [14,29]. Nowadays, there have been many methods of hand pose estimation [9,26] or hand gesture recognition [23]. However, modeling a 3D hand shape based on vision is still an open problem because of complexity, gesture, occlusions of fingers, etc. Generally, the methods of 3D hand modeling mainly include three categories: manual modeling using 3D modeling software, scanning-based reconstruction, and image-

B

Chuhua Xian [email protected] Hao Peng [email protected] Yunbo Zhang [email protected]

1

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

2

Department of Industrial and Systems Engineering, Rochester Institute of Technology, New York, USA

based modeling. Manual modeling requires using 3D modeling software, such as Maya, 3DS Max, or Blender, to model the hand shape. This method is quite tedious, time-consuming and requires professional skills. Scanning-based reconstruction methods utilize 3D scanner to obtain the point cloud and then get the 3D shape by mesh reconstruct algorithms, which suffer from the noisy and incomplete point cloud data because of the li

Data Loading...

3D hand mesh reconstruction from a monocular RGB image

Recommend Documents

CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image

A Direct Method for Robust Model-Based 3D Object Tracking from a Monocular RGB Image

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image

Instantiation-Net: 3D Mesh Reconstruction from Single 2D Image for Right Ventricle

Eyeglasses 3D Shape Reconstruction from a Single Face Image

3D Human Shape Reconstruction from a Polarization Image

SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation

Monocular Dense 3D Reconstruction Algorithm Based on Inverse Depth Filter

Monocular Surface Reconstruction Using 3D Deformable Part Models

NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image

Creating Stereoscopic (3D) Video from a 2D Monocular Video Stream