Speech Emotion Recognition UsingConvolutional Neural Network and Long-Short TermMemory

PDF / 1,432,155 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
25 Downloads / 225 Views

Speech Emotion Recognition Using Convolutional Neural Network and Long-Short Term Memory Ranjana Dangol 1,2 & Abeer Alsadoon 1,2 Omar Hisham Alsadoon 3

& P. W. C. Prasad

1,2

& Indra Seher

1,2

&

Received: 21 March 2020 / Revised: 31 July 2020 / Accepted: 21 August 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Human-Robot interactions involve human intentions and human emotion. After the evolvement of positive psychology, the psychological research has a tremendous concentration to study the factors involved in the human emotion generation. Speech emotion recognition (SER) is a challenging job due to the complexity of emotions. Human emotion recognition is gaining importance as good emotional health can lead to good social and mental health. Although there are different approaches for speech emotion recognition, the most advanced model is Convolutional Neural Network (CNN) using Long Short-term Memory (LSTM) network. But they also suffer from the lack of parallelization of the sequences and computation times. Meanwhile, attention-mechanism has way better exhibitions in learning significant feature representations for specific tasks. Based on this technique, we propose an emotion recognition system with relation aware, self-attention mechanism to memorize the discriminative features for SER, where spectrograms are utilized as input. A CNN with a relation-aware self-attention is modelled to analyse 3D log-Mel spectrograms to extract the high-level features. Different layers such as 3D convolutional layers, 3D Max-pooling layers, and LSTM networks are used in the model. Here, the attention layer is exercised to support distinct parts of emotion and assemble discriminative utterance-level representations for SER. Finally, the fully connected layer is equipped with the utterance level representations with 64 output units to achieve higher-level representations. The approach of relationaware attention-based 3D CNN and LSTM model provided a better outcome of 80.80% on average scale in speech emotion recognition. The proposed model in this paper focuses on enhancement of the attention mechanism to gain additional benefits of sequence to sequence parallelization by improving the recognition accuracy. Keywords Deep Learning . Convolutional Neural Network (CNN) . Long Short-term Memory (LSTM) . Long short-term memory . Relation-aware self-attention mechanism . Hierarchical Spectral Clustering (HSC) . Speech Emotion Recognition (SER) * Abeer Alsadoon [email protected] Extended author information available on the last page of the article

Multimedia Tools and Applications

1 Introduction In the context of emotion recognition, speech plays an important role in identifying human’s emotional state. As we have known the fact that one person can infer another person’s emotions by judging their expressions, voice, and gestures [1, 4, 16]. When a person goes through emotional imbalance, changes in voice quality and behaviour are evident, along with the increased heart rate and rising

Data Loading...

Speech Emotion Recognition UsingConvolutional Neural Network and Long-Short TermMemory

Recommend Documents

Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network

EEG-based emotion recognition using 4D convolutional recurrent neural network

Deep Neural Network-Based Human Emotion Recognition by Computer Vision

Emotion Recognition in Sentences - A Recurrent Neural Network Approach

A Novel Isolated Speech Recognition Method Based on Neural Network

Multi-features Integration for Speech Emotion Recognition

Speech and Facial Based Emotion Recognition Using Deep Learning Approaches

Stress and Emotion Recognition Using Acoustic Speech Analysis

Correction to: Emotion recognition of speech signal using Taylor series and deep belief network based classification

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Speech Emotion Recognition Using Spectrogram Patterns as Features

Significance of Phonological Features in Speech Emotion Recognition