Human Attribute Recognition by Deep Hierarchical Contexts
We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a po
- PDF / 4,266,188 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 84 Downloads / 246 Views
Abstract. We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classification score jointly learned in our CNN, yielding final scene-aware predictions. To facilitate our study, a largescale WIDER Attribute dataset(Dataset URL: http://mmlab.ie.cuhk. edu.hk/projects/WIDERAttribute) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.
1
Introduction
Accurate recognition of human attributes such as gender and clothing style can benefit many applications such as person re-identification [1–4] in videos. However, this task still remains challenging in unconstrained settings where images of people exhibit large variation of viewpoint, pose, illumination and occlusion. Consider, for example, Fig. 1 where inferring the attributes “formal suits” and “sunglasses” from only the target person is very difficult, due to the occlusion and low image quality respectively. Fortunately, we have access to the hierarchical contexts—from the neighboring similar people to the global image scene wherein the target person appears. Leveraging such contextual cues makes attributes much more recognizable, e.g. being aware of a funeral event, we would be more confident about people wearing “formal suits”. We build on this intuition to develop a robust method for unconstrained human attribute recognition. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46466-4 41) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 684–700, 2016. DOI: 10.1007/978-3-319-46466-4 41
Human Attribute Recognition by Deep Hierarchical Contexts
685
Fig. 1. WIDER Attribute - example images to motivate the use of hierarchical contexts for robust attribute recognition for the target person (red box): the humancentric context and scene-level context help resolve visual ambiguities due to occlusion and low image quality (low resolution/blurring). (Color figure online)
Our method is inspired by recent attribute models using parts, such as Poselets [5], Deformable Part Model (DPM) [6] and window-specific parts [7]. These methods are robust against pose and viewpoint variations. They are also capable of localizing attribute clues at varying scales (e.g. small glass
Data Loading...