Recognition of emotion from speech using evolutionary cepstral coefficients

PDF / 1,304,974 Bytes
21 Pages / 439.642 x 666.49 pts Page_size
82 Downloads / 255 Views

Recognition of emotion from speech using evolutionary cepstral coeﬃcients Ali Bakhshi1

· Stephan Chalup1 · Ali Harimi2 · Seyed Mostafa Mirhassani3

Received: 18 June 2019 / Revised: 5 May 2020 / Accepted: 11 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract An optimal representation of acoustic features is an ongoing challenge in automatic speech emotion recognition research. In this study, we proposed Cepstral coefficients based on evolutionary filterbanks as emotional features. It is difficult to guarantee that an individual optimized filterbank provides the best representation for emotion classification. Consequently, we employed six HMM-based binary classifiers that used a specific filterbank, which was optimized by a genetic algorithm to categorize the data into seven emotion classes. These optimized classifiers were applied in a hierarchical manner and outperformed conventional Mel Frequency Cepstral Coefficients in terms of overall emotion classification accuracy. The proposed method using evolutionary-based Cepstral coefficients achieved a weighted average recall of 87.29% on the Berlin database while the same approach but using conventional Cepstral features achieved only 79.63%. Keywords Genetic algorithm · Mel filterbank · Cepstral coefficients · Speech emotion recognition

1 Introduction Automatic speech emotion recognition (SER) has been an attractive research area in the last decade. Human emotions can partially be encoded in speech prosody. Most acoustical Ali Bakhshi

[email protected] Stephan Chalup [email protected] Ali Harimi [email protected] Seyed Mostafa Mirhassani [email protected] 1

School of Electrical Engineering and Computing, The University of Newcastle, Newcastle, Australia

2

Department of Electrical Engineering, Islamic Azad University, Shahrood Branch, Shahrood, Iran

3

Department of Biomedical Engineering, University of Malaya, Kuala Lumpur, Malaysia

Multimedia Tools and Applications

features employed for SER can be categorized into two major groups: prosodic and spectral. Pitch (F 0) and intensity are among the most prominent prosodic features, and MFCC (Mel Frequency Cepstral Coefficient), PLP (Perceptual Linear Prediction), and formants are the most important spectral features. In literature, pitch and energy were reported as standard emotional features [3, 51, 52]. Also, spectral features that mainly were extracted from the sub-banded spectrum of speech have been shown complementary to prosodic features [24]. The authors of [58] derived spectral patterns for SER from speech spectrograms that were divided by the Bark scale [83]. The equivalent rectangular bandwidth (ERB) gives an unrealistic but convenient simplification of rectangular band-pass filters to extract sub-banded spectral features [29]. In order to model the perception of speech in a manner similar to the human ear, the Mel frequency scale is linearly below 1kHz and logarithmically above that [29]. MFCCs are the most widespread

Data Loading...

Recognition of emotion from speech using evolutionary cepstral coefficients

Recommend Documents

Automatic Recognition of Bird Species Using Human Factor Cepstral Coefficients

Speech and Facial Based Emotion Recognition Using Deep Learning Approaches

Speech Emotion Recognition Using Spectrogram Patterns as Features

Stress and Emotion Recognition Using Acoustic Speech Analysis

Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network

Multi-features Integration for Speech Emotion Recognition

Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

Pattern recognition and features selection for speech emotion recognition model using deep learning

Emotion Recognition Using Chatbot System

Significance of Phonological Features in Speech Emotion Recognition

Emotion and Depression Detection from Speech

Speech Emotion Recognition UsingConvolutional Neural Network and Long-Short TermMemory