Lightweight densely connected residual network for human pose estimation
- PDF / 2,596,865 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 102 Downloads / 208 Views
ORIGINAL RESEARCH PAPER
Lightweight densely connected residual network for human pose estimation Lianping Yang1 · Yu Qin1 · Xiangde Zhang1 Received: 27 March 2020 / Accepted: 21 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Most existing methods pay much attention to how to improve the accuracy of human pose estimation results. They usually ignore what the size of their model is. However, besides accuracy, real-time and speed are also important. In this paper, a new module named Densely Connected Residual Module is presented to effectively decrease the number of parameters in our network. We introduce our module into the backbone of High-Resolution Net. In addition, we change direct addition fusion into pyramid fusion at the end of the network. No need for ImageNet pre-training sharply decreases the total time of our training processes. We do our experiments over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. As a result, we achieve a decrease on number of parameters and calculated amount, respectively by around 72% and 14%, making our network more lightweight than High-Resolution Net. During testing process, our model can predict an image at a speed of 25 ms per image, which also achieves real-time fundamentally. The code has been available at https://github.com/consistent1997/LDCRN. Keywords Human pose estimation · Densely connected residual module · Pyramid fusion · High-resolution net
1 Introduction Human pose estimation (HPE) has developed a lot in the last few years. It has been a fundamental yet challenging problem in computer vision field. This task is aimed to localize human’s keypoints, such as, wrist, elbow and so on. As a basic technique on human behavior analysis, it has attracted increasing attention in recent years. There are many applications, including human behavior recognition [1–5], humancomputer interaction, animation, etc. HPE can be divided into single-person pose estimation and multi-person pose estimation. This paper will pay attention to the problem of single-person pose estimation. Since this problem is a basic part of the other related problems like
* Xiangde Zhang [email protected] Lianping Yang [email protected] Yu Qin [email protected] 1
College of Sciences, Northeastern University, Shenyang 110819, Liaoning, China
multi-person estimation [6–14], video pose estimation [15], pose tracking [16, 17] and so on. As for single-person pose estimation, most traditional approaches have adopted probabilistic graphic model [18–21]. For example, tree models and random forest models were demonstrated to be very effective in HPE. Pictorial structure model optimizes a configuration of parts as a function of local image evidence for a part and a prior in the human kinematical chain. Next, convolutional pose machines [22] were proposed, representing the transition from traditional methods to deep learning methods. The recent developments show that deep convolutional neural networks h
Data Loading...