Human Pose Estimation Using Deep Consensus Voting

In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to

PDF / 5,923,145 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
22 Downloads / 271 Views

DOWNLOAD

REPORT

Abstract. In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to utilize information from the whole image, rather than rely on a sparse set of keypoint locations. Using dense, multi-target votes, not only produces good keypoint predictions, but also enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting. This diﬀers from most previous methods where joint probabilities are learned from relative keypoint locations and are independent of the image. We ﬁnally combine the keypoints votes and joint probabilities in order to identify the optimal pose conﬁguration. We show our competitive performance on the MPII Human Pose and Leeds Sports Pose datasets.

1

Introduction

In recent years, with the resurgence of deep learning techniques, the accuracy of human pose estimation from a single image has improved dramatically. Yet despite this recent progress, it is still a challenging computer vision task and state-of-the-art results are far from human performance. The general approach in previous works, such as [22,26], is to train a deep neural net as a keypoint detector for all keypoints. Given an image I, the net is fed a patch of the image Iy ⊂ I centered around pixel y and predicts if y is one of the M keypoints of the model. This process is repeated in a sliding window approach, using a fully convolutional implementation, to produce M heat maps, one for each keypoint. Structured prediction, usually by a graphical model, is then used to combine these heat maps into a single pose prediction. This approach has several drawbacks. First, most pixels belonging to the person are not themselves any of the keypoints and therefore contribute only limited information to the pose estimation process. Information from the entire person can be used to get more reliable predictions, particularly in the face of partial I. Lifshitz and E. Fetaya—Equal contribution. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46475-6 16) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 246–260, 2016. DOI: 10.1007/978-3-319-46475-6 16

Human Pose Estimation Using Deep Consensus Voting

247

occlusion where the keypoint itself is not visible. Another drawback is that while the individual keypoint predictors use state-of-the-art classiﬁcation methods to produce high quality results, the binary terms in the graphical model, enforcing global pose consistency, are based only on relative keypoint location statistics gathered from the training data and are independent of the input image.

Fig. 1. Our model’s predicted pose estimation on the MPII-human-pose database testset [1]. Each pose is represented as a stick ﬁgure, inferred from predicted joints

Data Loading...

Human Pose Estimation Using Deep Consensus Voting

Recommend Documents

Consensus Voting

3D Human Pose Estimation with 2D Human Pose and Depthmap

Human Pose Estimation in Space and Time Using 3D CNN

Enhancing feature fusion for human pose estimation

Human Upper Body Pose Region Estimation

Towards Viewpoint Invariant 3D Human Pose Estimation

Pose Estimation of UAVs Using Stereovision

Robust Pose Recognition Using Deep Learning

3D Human Body Shape and Pose Estimation from Depth Image

Human Pose Estimation via Convolutional Part Heatmap Regression

Lightweight densely connected residual network for human pose estimation

Occlusion-Aware Siamese Network for Human Pose Estimation