Speaker independent feature selection for speech emotion recognition: A multi-task approach

PDF / 781,828 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
63 Downloads / 320 Views

Speaker independent feature selection for speech emotion recognition: A multi-task approach Elham Kalhor 1 & Behzad Bakhtiari 1 Received: 4 May 2019 / Revised: 25 August 2020 / Accepted: 19 October 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Nowadays, automatic speech emotion recognition has numerous applications. One of the important steps of these systems is the feature selection step. Because it is not known which acoustic features of person’s speech are related to speech emotion, much effort has been made to introduce several acoustic features. However, since employing all of these features will lower the learning efficiency of classifiers, it is necessary to select some features. Moreover, when there are several speakers, choosing speaker-independent features is required. For this reason, the present paper attempts to select features which are not only related to the emotion of speech, but are also speaker-independent. For this purpose, the current study proposes a multi-task approach which selects the proper speaker-independent features for each pair of classes. The selected features are then given to the classifier. Finally, the outputs of the classifiers are appropriately combined to achieve an output of a multi-class problem. Simulation results reveal that the proposed approach outperforms other methods and offers higher efficiency in terms of detection accuracy and runtime. Keywords Speech emotion recognition . Multi-task feature selection . Speaker independent features

1 Introduction In the formation of an individual’s emotions, there is a set of emotions, such as happiness, sadness, anger, disgust, boredom, surprise, fear, and neutrality. Emotions play a prominent role in human communications. They are critical for exhibiting behavior under different conditions.

* Behzad Bakhtiari [email protected] Elham Kalhor [email protected]

1

Department of Computer Engineering, Sadjad University of Technology, No. 64 Jalal Al Ahmad St, 9188148848 Mashhad, Iran

Multimedia Tools and Applications

On one hand, emotions cause psychological changes, which form in the brain and manifest themselves in human reactions. In addition, emotions increase or decrease the physiological stimuli of the body and positively or negatively impact behavior and thoughts. Moreover, the emotion of speech depends on the speaker’s language and culture, gender, age, speech content, and other factors [12, 20]. Speech emotion recognition offers numerous applications in human-machine communication systems. For example, there are different applications in the field of education, computer games, medicine, customer communication systems, telephone centers, and mobile communications [5, 31, 32, 44, 45]. Nevertheless, automatic speech emotion recognition requires an acoustic feature extraction. Since there is no information about which features are related to a speaker’s emotions, many researchers have proposed several features. Unfortunately, employing all of these may pose two basic challen

Data Loading...

Speaker independent feature selection for speech emotion recognition: A multi-task approach

Recommend Documents

Text-independent speaker recognition using LSTM-RNN and speech enhancement

Deep Residual Local Feature Learning for Speech Emotion Recognition

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Fea

Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm

Pattern recognition and features selection for speech emotion recognition model using deep learning

Multi-features Integration for Speech Emotion Recognition

Feature Selection-Based Approach for Generalized Physical Contradiction Recognition

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Robust features for text-independent speaker recognition with short utterances

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Feature Selection for Data and Pattern Recognition

DVDGCN: Modeling Both Context-Static and Speaker-Dynamic Graph for Emotion Recognition in Multi-speaker Conversations