Human Pose Estimation via Convolutional Part Heatmap Regression

This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of se

PDF / 17,235,668 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
85 Downloads / 235 Views

DOWNLOAD

REPORT

Abstract. This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture speciﬁcally designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. To this end, we propose a detection-followed-by-regression CNN cascade. The ﬁrst part of our cascade outputs part detection heatmaps and the second part performs regression on these heatmaps. The beneﬁts of the proposed architecture are multi-fold: It guides the network where to focus in the image and eﬀectively encodes part constraints and context. More importantly, it can eﬀectively cope with occlusions because part detection heatmaps for occluded parts provide low conﬁdence scores which subsequently guide the regression part of our network to rely on contextual information in order to predict the location of these parts. Additionally, we show that the proposed cascade is ﬂexible enough to readily allow the integration of various CNN architectures for both detection and regression, including recent ones based on residual learning. Finally, we illustrate that our cascade achieves top performance on the MPII and LSP data sets. Code can be downloaded from http://www.cs.nott.ac.uk/ ∼psxab5/. Keywords: Human pose estimation volutional Neural Networks

1

· Part heatmap regression · Con-

Introduction

Articulated human pose estimation from images is a Computer Vision problem of extraordinary diﬃculty. Algorithms have to deal with the very large number of feasible human poses, large changes in human appearance (e.g. foreshortening, clothing), part occlusions (including self-occlusions) and the presence of multiple people within close proximity to each other. A key question for addressing these problems is how to extract strong low and mid-level appearance features capturing discriminative as well as relevant contextual information and how to model complex part relationships allowing for eﬀective yet eﬃcient pose inference. Being capable of performing these tasks in an end-to-end fashion, Convolutional Neural Networks (CNNs) have been recently shown to feature remarkably robust performance and high part localization accuracy. Yet, the accurate estimation of c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 717–732, 2016. DOI: 10.1007/978-3-319-46478-7 44

718

A. Bulat and G. Tzimiropoulos part heatmaps 256 x 256

part detection network regression heatmaps

stacked part heatmaps

regression network

Fig. 1. Proposed architecture: Our CNN cascade consists of two connected deep subnetworks. The ﬁrst one (upper part in the ﬁgure) is a part detection network trained to detect the individual body parts using a per-pixel sigmoid loss. Its output is a set of N part heatmaps. The second one is a regression subnetwork that jointly regresses the part heatmaps stacked alongside the input image to conﬁdence maps representing the location of the body parts.

the locations of occluded body parts i

Data Loading...

Human Pose Estimation via Convolutional Part Heatmap Regression

Recommend Documents

3D Human Pose Estimation with 2D Human Pose and Depthmap

Enhancing feature fusion for human pose estimation

Human Pose Estimation Using Deep Consensus Voting

Human Upper Body Pose Region Estimation

Towards Viewpoint Invariant 3D Human Pose Estimation

Improved Vision Based Pose Estimation for Industrial Robots via Sparse Regression

Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose

Towards Part-Aware Monocular 3D Human Pose Estimation: An Architecture Search Approach

Estimation of age in unidentified patients via chest radiography using convolutional neural network regression

Deep Kinematic Pose Regression

A generalizable approach for multi-view 3D human pose regression

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis