Human emotion recognition based on the weighted integration method using image sequences and acoustic features

PDF / 1,043,261 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
51 Downloads / 260 Views

Human emotion recognition based on the weighted integration method using image sequences and acoustic features Sung-Woo Byun 1 & Seok-Pil Lee 2 Received: 4 June 2020 / Revised: 27 July 2020 / Accepted: 9 September 2020 # The Author(s) 2020

Abstract

People generally perceive other people’s emotions based on speech and facial expressions, so it can be helpful to use speech signals and facial images simultaneously. However, because the characteristics of speech and image data are different, combining the two inputs is still a challenging issue in the area of emotion-recognition research. In this paper, we propose a method to recognize emotions by synchronizing speech signals and image sequences. We design three deep networks. One of the networks is trained using image sequences, which focus on facial expression changes. Facial landmarks are also input to another network to reflect facial motion. The speech signals are first converted to acoustic features, which are used for the input of the other network, synchronizing the image sequence. These three networks are combined using a novel integration method to boost the performance of emotion recognition. A test comparing accuracy is conducted to verify the proposed method. The results demonstrated that the proposed method exhibits more accurate performance than previous studies. Keywords Emotion recognition . Acoustic feature . Facial expression . Model integration

1 Introduction Recently, high-performance personal computers have been rapidly popularized with the technological development of information society. Accordingly, the interaction between * Seok-Pil Lee [email protected] Sung-Woo Byun [email protected]

1

Graduate School, Department of Computer Science, SangMyung University, Seoul, Republic of Korea

2

Department of Electronic Engineering, SangMyung University, Seoul, Republic of Korea

Multimedia Tools and Applications

humans and computers is actively changing into a bidirectional interface, and a better understanding of human emotions is needed, which could improve human–machine interaction systems [4]. In signal processing, emotion recognition has become an attractive research topic [45]. Therefore, the goal of this human interface is to extract and recognize the emotional state of individuals accurately and to provide personalized media according to a user’s emotional state. Emotion refers to a conscious mental reaction subjectively experienced as strong feeling typically accompanied by physiological and behavioral changes in the body [3]. To recognize a user’s emotional state, several studies have applied different forms of input, such as speech, facial expression, video, text, and others [11, 13, 15, 25, 39, 42, 47]. Among the methods using these inputs, facial emotion recognition (FER) has been gaining substantial attention over the past decades. Conventional FER approaches generally have three main steps: 1) detecting a facial region from an input image, 2) extracting facial features, and 3) recognizing emotions. In conventional methods, it is

Data Loading...

Human emotion recognition based on the weighted integration method using image sequences and acoustic features

Recommend Documents

Multi-features Integration for Speech Emotion Recognition

Acoustic Emission Recognition Based on Spectrogram and Acoustic Features

Stress and Emotion Recognition Using Acoustic Speech Analysis

EEG-Based BCI Emotion Recognition Using the Stock-Emotion Dataset

Fisher Kernels on Phase-Based Features for Speech Emotion Recognition

Human Emotion Recognition from Spontaneous Thermal Image Sequence Using GPU Accelerated Emotion Landmark Localization an

Human action recognition using distance transform and entropy based features

EEG based emotion recognition using fusion feature extraction method

Human posture recognition based on multiple features and rule learning

Face Recognition From Image Sequences

Speech Emotion Recognition Using Spectrogram Patterns as Features

Human Action Recognition Method Based on Video-Level Features and Attention Mechanism