Human attribute recognition method based on pose estimation and multiple-feature fusion

  • PDF / 1,742,597 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 102 Downloads / 216 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Human attribute recognition method based on pose estimation and multiple-feature fusion Xiao Ke1,2,3

· Tongan Liu1,3 · Zhenda Li1,3

Received: 15 October 2019 / Revised: 3 February 2020 / Accepted: 2 April 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract As easy-to-search semantic information, human clothing attributes have important research value in the field of computer vision. Existing attribute recognition methods encounter problems such as interference from environmental factors, and as a result show poor clothing positioning accuracy. To address these problems, a human attribute recognition method based on human pose estimation and multiple-feature fusion is proposed. First, some retrieval results are obtained for subsequent attribute recognition through appearance feature matching. Then, through a deep SSD-based human pose estimation method, the foreground area belonging to the human in the image is located, and the background interference is excluded. Finally, the analytical results of various methods are combined. The iterative smoothing process and the maximum posteriori probability assignment method are adopted to enhance the correlation between attribute labels and pixels, and the final attribute recognition results are obtained. Experiments on the benchmark dataset show that the performance of our model is improved, and solves the problems of inaccurate clothing label recognition and pixel resolution area deviation in a single recognition mode. Keywords Deep learning · SSD · Pose estimation · Multiple-feature · Human attribute recognition

1 Introduction Human attribute recognition [1] has important research value in the field of computer vision. Human attributes such as age, gender, and clothing worn can be used as easy-to-search semantic information that can be applied to video surveillance for biometric recognition, and can be applied to face detection [2], automatic image annotation [3], and saliency detection [4]. An important advantage of semantic information based on low-level visual features is its robustness to the diversity of viewpoint changes, which means that human attributes can serve as a basis for subsequent long-term computer vision work. The recognition of human attributes in video-acquired surveillance images in the real world is challenging for a

B

Xiao Ke [email protected]

1

College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China

2

Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou 350003, China

3

Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350116, China

variety of reasons. First, imaging quality is usually poor, with low resolution and a high susceptibility to motion blur [5]. Secondly, the recognition may be affected by the appearance of the clothing, and because of different human poses in different images, the corresponding attributes can be located in different spatial positions i