Human Attribute Recognition by Deep Hierarchical Contexts

We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a po

PDF / 4,266,188 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
84 Downloads / 258 Views

DOWNLOAD

REPORT

Abstract. We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classiﬁcation score jointly learned in our CNN, yielding ﬁnal scene-aware predictions. To facilitate our study, a largescale WIDER Attribute dataset(Dataset URL: http://mmlab.ie.cuhk. edu.hk/projects/WIDERAttribute) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.

1

Introduction

Accurate recognition of human attributes such as gender and clothing style can beneﬁt many applications such as person re-identiﬁcation [1–4] in videos. However, this task still remains challenging in unconstrained settings where images of people exhibit large variation of viewpoint, pose, illumination and occlusion. Consider, for example, Fig. 1 where inferring the attributes “formal suits” and “sunglasses” from only the target person is very diﬃcult, due to the occlusion and low image quality respectively. Fortunately, we have access to the hierarchical contexts—from the neighboring similar people to the global image scene wherein the target person appears. Leveraging such contextual cues makes attributes much more recognizable, e.g. being aware of a funeral event, we would be more conﬁdent about people wearing “formal suits”. We build on this intuition to develop a robust method for unconstrained human attribute recognition. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46466-4 41) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 684–700, 2016. DOI: 10.1007/978-3-319-46466-4 41

Human Attribute Recognition by Deep Hierarchical Contexts

685

Fig. 1. WIDER Attribute - example images to motivate the use of hierarchical contexts for robust attribute recognition for the target person (red box): the humancentric context and scene-level context help resolve visual ambiguities due to occlusion and low image quality (low resolution/blurring). (Color ﬁgure online)

Our method is inspired by recent attribute models using parts, such as Poselets [5], Deformable Part Model (DPM) [6] and window-speciﬁc parts [7]. These methods are robust against pose and viewpoint variations. They are also capable of localizing attribute clues at varying scales (e.g. small glass

Data Loading...

Human Attribute Recognition by Deep Hierarchical Contexts

Recommend Documents

Deep Neural Network-Based Human Emotion Recognition by Computer Vision

Local attribute reductions of formal contexts

Hybridized Deep Learning Architectures for Human Activity Recognition

Human action recognition using deep rule-based classifier

A Model for Automated Food Logging Through Food Recognition and Attribute Estimation Using Deep Learning

Granular matrix method of attribute reduction in formal contexts

Attribute reduction of SE-ISI concept lattices for incomplete contexts

Hierarchical Kinematic Human Mesh Recovery

Human Action Recognition Without Human

Human attribute recognition method based on pose estimation and multiple-feature fusion

Human Rights and Religion in Educational Contexts

Hierarchical Perceptual Grouping for Object Recognition Theoretical