Deep Bimodal Regression for Apparent Personality Analysis

Apparent personality analysis from short video sequences is a challenging problem in computer vision and multimedia research. In order to capture rich information from both the visual and audio modality of videos, we propose the Deep Bimodal Regression (D

  • PDF / 2,411,992 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 4 Downloads / 268 Views

DOWNLOAD

REPORT


Abstract. Apparent personality analysis from short video sequences is a challenging problem in computer vision and multimedia research. In order to capture rich information from both the visual and audio modality of videos, we propose the Deep Bimodal Regression (DBR) framework. In DBR, for the visual modality, we modify the traditional convolutional neural networks for exploiting important visual cues. In addition, taking into account the model efficiency, we extract audio representations and build the linear regressor for the audio modality. For combining the complementary information from the two modalities, we ensemble these predicted regression scores by both early fusion and late fusion. Finally, based on the proposed framework, we come up with a solution for the Apparent Personality Analysis competition track in the ChaLearn Looking at People challenge in association with ECCV 2016. Our DBR is the winner (first place) of this challenge with 86 registered teams. Keywords: Apparent personality analysis · Deep regression learning Bimodal learning · Convolutional neural networks

1

·

Introduction

Video analysis is one of the key tasks in computer vision and multimedia research, especially human-centered video analysis. In recent years, humancentered videos have become ubiquitous on the internet, which has encouraged the development of algorithms that can analyze their semantic contents for various applications, including first-person video analyses [14,19,21], activity recognition [1,4], gesture and pose recognition [8,11,22] and many more [13,15,20,23]. Moreover, apparent personality analysis (APA) is an important problem of human-centered video analysis. The goal of APA is to develop algorithms for recognizing personality traits of users in short video sequences. Personality traits are usually decomposed into components called the Big Five Traits, including This work was supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization. X.-S. Wei is the team director of the APA competition, and J. Wu is the corresponding author. c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part III, LNCS 9915, pp. 311–324, 2016. DOI: 10.1007/978-3-319-49409-8 25

312

C.-L. Zhang et al.

openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism. Effective apparent personality analysis is challenging due to several factors: cultural and individual differences in tempos and styles of articulation, variable observation conditions, the small size of faces in images taken in typical scenarios, noise in camera channels, infinitely many kinds of out-of-vocabulary motion, and real-time performance constraints. In this paper, we propose the Deep Bimodal Regression (DBR) framework for APA. As shown in Fig. 1, DBR treats human-centered videos as having with two modalities, i.e., the visual and the audio modality. Then, in these two modalities, deep visual regression networks and audio regression models are built for c