Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition

  • PDF / 1,925,236 Bytes
  • 32 Pages / 439.37 x 666.142 pts Page_size
  • 25 Downloads / 326 Views

DOWNLOAD

REPORT


Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition Gaurav Agarwal 1 & Hari Om 1 Received: 29 January 2020 / Revised: 8 September 2020 / Accepted: 19 October 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

This paper proposes a speech emotion recognition technique based on Optimized Deep Neural Network. The speech signals are denoised by presenting a novel adaptive wavelet transform with a modified galactic swarm optimization algorithm (AWT_MGSO). From the noise removed speech signals, the spectral features like LPC (Linear Prediction Coefficients), MFCC (Mel frequency cepstral coefficients), PSD (power spectral density) and prosodic features like energy, entropy, formant frequencies and pitch are extracted and certain features are selected by ASFO (Adaptive Sunflower Optimization Algorithm). The optimized DNN-DHO (Deep Neural Network with Deer Hunting Optimization Algorithm) is proposed for emotion classification. An enhanced squirrel search algorithm is proposed to update the weight in the optimized DNN_DHO classifier. In this study, all the eight emotions of the speech from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) and TESS (Toronto Emotional Speech Set) databases for English and IITKGP-SEHSC (Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus) database for Hindi are classified. The experimental results are obtained and compared with the classifiers such as DNN_DHO, DNN (Deep Neural Network) and DAE (Deep Auto Encoder). The experimental results show that the proposed algorithm obtains maximum accuracy as 97.85% by the TESS dataset, 97.14% by the RAVDESS dataset and 93.75% by the IITKGP-SEHSC dataset by the DNN-HHO classifier. Keywords Speech emotion recognition . Adaptive wavelet transform . Modified galactic swarm optimization . Adaptive sunflower optimization algorithm . Optimized deep neural network . Deer hunting optimization algorithm

* Gaurav Agarwal [email protected]

1

Department of Computer Science & Engineering, IIT(ISM), Dhanbad, Jharkhand 826004, India

Multimedia Tools and Applications

1 Introduction Recently, speech emotion recognition acts as an active research domain in recognizing the speaker’s emotional state by enabling speech analysis systems [5]. Speech Emotion Recognition or SER is referred to as automatic recognition of human emotion. This benefits to enrich next-gen AI with emotional intelligence abilities by grasping the emotion from voice and words. It is trusted that SER enhances the speech recognition systems’ performance by extracting useful semantics from speech. Speech contains more information when comparing with speaker information and spoken words. The speech emotion recognition (SER) has achieved popularity and used with the development of Artificial Intelligence and intelligent assistants like Apple’s Siri, Microsoft’s Cortana and Amazon’s Alexa [17]. SER is a simulation model in which PC recognize as well as realize human fe