Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition

PDF / 1,925,236 Bytes
32 Pages / 439.37 x 666.142 pts Page_size
25 Downloads / 347 Views

Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition Gaurav Agarwal 1 & Hari Om 1 Received: 29 January 2020 / Revised: 8 September 2020 / Accepted: 19 October 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

This paper proposes a speech emotion recognition technique based on Optimized Deep Neural Network. The speech signals are denoised by presenting a novel adaptive wavelet transform with a modified galactic swarm optimization algorithm (AWT_MGSO). From the noise removed speech signals, the spectral features like LPC (Linear Prediction Coefficients), MFCC (Mel frequency cepstral coefficients), PSD (power spectral density) and prosodic features like energy, entropy, formant frequencies and pitch are extracted and certain features are selected by ASFO (Adaptive Sunflower Optimization Algorithm). The optimized DNN-DHO (Deep Neural Network with Deer Hunting Optimization Algorithm) is proposed for emotion classification. An enhanced squirrel search algorithm is proposed to update the weight in the optimized DNN_DHO classifier. In this study, all the eight emotions of the speech from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) and TESS (Toronto Emotional Speech Set) databases for English and IITKGP-SEHSC (Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus) database for Hindi are classified. The experimental results are obtained and compared with the classifiers such as DNN_DHO, DNN (Deep Neural Network) and DAE (Deep Auto Encoder). The experimental results show that the proposed algorithm obtains maximum accuracy as 97.85% by the TESS dataset, 97.14% by the RAVDESS dataset and 93.75% by the IITKGP-SEHSC dataset by the DNN-HHO classifier. Keywords Speech emotion recognition . Adaptive wavelet transform . Modified galactic swarm optimization . Adaptive sunflower optimization algorithm . Optimized deep neural network . Deer hunting optimization algorithm

* Gaurav Agarwal [email protected]

1

Department of Computer Science & Engineering, IIT(ISM), Dhanbad, Jharkhand 826004, India

Multimedia Tools and Applications

1 Introduction Recently, speech emotion recognition acts as an active research domain in recognizing the speaker’s emotional state by enabling speech analysis systems [5]. Speech Emotion Recognition or SER is referred to as automatic recognition of human emotion. This benefits to enrich next-gen AI with emotional intelligence abilities by grasping the emotion from voice and words. It is trusted that SER enhances the speech recognition systems’ performance by extracting useful semantics from speech. Speech contains more information when comparing with speaker information and spoken words. The speech emotion recognition (SER) has achieved popularity and used with the development of Artificial Intelligence and intelligent assistants like Apple’s Siri, Microsoft’s Cortana and Amazon’s Alexa [17]. SER is a simulation model in which PC recognize as well as realize human fe

Data Loading...

Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition

Recommend Documents

Speech and Facial Based Emotion Recognition Using Deep Learning Approaches

Deep Residual Local Feature Learning for Speech Emotion Recognition

Emotion Recognition in Speech with Deep Learning Architectures

Pattern recognition and features selection for speech emotion recognition model using deep learning

Deep Learning for NLP and Speech Recognition

Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Multi-features Integration for Speech Emotion Recognition

Fisher Kernels on Phase-Based Features for Speech Emotion Recognition

Hybrid-Deep Learning Model for Emotion Recognition Using Facial Expressions

New Era for Robust Speech Recognition Exploiting Deep Learning

Correction to: Emotion recognition of speech signal using Taylor series and deep belief network based classification