Pattern recognition and features selection for speech emotion recognition model using deep learning

  • PDF / 1,600,419 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 87 Downloads / 310 Views

DOWNLOAD

REPORT


Pattern recognition and features selection for speech emotion recognition model using deep learning Kittisak Jermsittiparsert1 · Abdurrahman Abdurrahman2 · Parinya Siriattakul3 · Ludmila A. Sundeeva4 · Wahidah Hashim5 · Robbi Rahim6 · Andino Maseleno7 Received: 12 November 2019 / Accepted: 17 February 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Automatic speaker recognizing models consists of a foundation on building various models of speaker characterization, pattern analyzing and engineering. The effect of classification and feature selection methods for the speech emotion recognition is focused. The process of selecting the exact parameter in arrangement with the classifier is an important part of minimizing the difficulty of system computing. This process becomes essential particularly for the models which undergo deployment in real time scenario. In this paper, a new deep learning speech based recognition model is presented for automatically recognizes the speech words. The superiority of an input source, i.e. speech sound in this state has straight impact on a classifier correctness attaining process. The Berlin database consist around 500 demonstrations to media persons that is both male and female. On the applied dataset, the presented model achieves a maximum accuracy of 94.21%, 83.54%, 83.65% and 78.13% under MFCC, prosodic, LSP and LPC features. The presented model offered better recognition performance over the other methods. Keywords  Deep learning · Speech · Emotion recognition · Feature extraction

1 Introduction The improvements in application with services are interesting to organize normal communication among human and machine. Indicating some of the orders through voice and movements is familiar in recent days. Enormous amount of data is gained from the audio of humans with better accuracy, human speech also comprises of alternative

information that has assets of the speaker such as age, gender, emotional condition, audio fault, and other characteristics in human audio. To declare input feature is efficient due to the simulation of speech features from each others with best act skills. The title itself describes about the models for the classification of emotion regarding human audio. Emotion is one of the crucial parameter in humans that represents their mental state that affects physiologically, whereas the 1



Ton Duc Thang University, Ho Chi Minh City, Vietnam

2

Kittisak Jermsittiparsert [email protected]



Physics Education Department, Lampung University, Tanjungkarang, Indonesia

3

Abdurrahman Abdurrahman [email protected]



School of Psychology, University of Queensland, Brisbane, Australia

* Andino Maseleno [email protected]

4

Parinya Siriattakul [email protected]



Togliatti State University, Tolyatti, Russia

5



Ludmila A. Sundeeva [email protected]

Institute of Informatics and Computing Energy, Universiti Tenaga Nasional, Kajang, Malaysia

6



Sekolah Tinggi Ilmu Man