Enhancing feature fusion for human pose estimation

PDF / 1,354,128 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
96 Downloads / 286 Views

ORIGINAL PAPER

Enhancing feature fusion for human pose estimation Rui Wang1 · Jiangwei Tong1 · Xiangyang Wang1 Received: 6 January 2020 / Revised: 12 May 2020 / Accepted: 17 July 2020 / Published online: 24 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Current human pose estimation methods mainly rely on designing efficient Convolutional Neural Networks (CNN) frameworks. These CNN architectures typically consist of high-to-low resolution sub-networks to learn semantic information, and then followed by low-to-high sub-networks to raise the resolution to locate the keypoints. Because low-level features have high resolution but less semantic information, while high-level features have rich semantic information but less high resolution details, so it is important to fuse different level features to improve the final performance. However, most existing models implement feature fusion by simply concatenate low-level and high-level features without considering the gap between spatial resolution and semantic levels. In this paper, we propose a new feature fusion method for human pose estimation. We introduce high level semantic information into low-level features to enhance feature fusion. Further, to keep both the high-level semantic information and high-resolution location details, we use Global Convolutional Network blocks to bridge the gap between low-level and high-level features. Experiments on MPII and LSP human pose estimation datasets demonstrate that efficient feature fusion can significantly improve the performance. The code is available at: https://github.com/tongjiangwei/ FeatureFusion. Keywords Human pose estimation · Convolutional neural networks · Feature fusion · Global convolutional network (GCN )

1 Introduction 2D human pose estimation (HPE) is a challenging problem in computer vision. It aims to recognize and locate the human anatomical keypoints in the images, which is fundamental for other applications like human action recognition, humancomputer interaction and animation. Recently, most methods of human pose estimation have achieved the state-of-the-art performance by using Convolutional neural networks (CNNs). For instance, as shown in Fig. 1a. Hourglass [1] proposes an exemplary encoder-decoder structure, the encoder consists of high-to-low networks and the decoder recovers the full resolution through a low-tohigh process. PyraNet [2] introduces a Pyramid Residual Module to learn image features in multi-scale convolutional

B

Xiangyang Wang [email protected] Rui Wang [email protected] Jiangwei Tong [email protected]

1

School of Communication and Information Engineering, Shanghai University, Shanghai, China

networks. Furthermore, based on the backbone of ResNet [3], SimpleBaseline [4] adopts transposed convolution operation to restore high-resolution representations. Despite of these great progresses, there still exist some challenges, such as occluded keypoints and clutter backgrounds. The main reason is: these networks do not properly handle the

Data Loading...

Enhancing feature fusion for human pose estimation

Recommend Documents

Human attribute recognition method based on pose estimation and multiple-feature fusion

3D Human Pose Estimation with 2D Human Pose and Depthmap

Human Pose Estimation Using Deep Consensus Voting

Human Upper Body Pose Region Estimation

Towards Viewpoint Invariant 3D Human Pose Estimation

FF-GAN: Feature Fusion GAN for Monocular Depth Estimation

Lightweight densely connected residual network for human pose estimation

Occlusion-Aware Siamese Network for Human Pose Estimation

Simple Fine-Tuning Attention Modules for Human Pose Estimation

VH3D-LSFM: Video-Based Human 3D Pose Estimation with Long-Term and Short-Term Pose Fusion Mechanism

A Latent Clothing Attribute Approach for Human Pose Estimation

Branch Information Correction Network for Human Pose Estimation