Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Speech emotion recognition is an interesting and challenging subject due to the emotion gap between speech signals and high-level speech emotion. To bridge this gap, this paper present a method of Chinese speech emotion recognition using Deep belief netwo

PDF / 430,438 Bytes
7 Pages / 439.37 x 666.14 pts Page_size
88 Downloads / 224 Views

DOWNLOAD

REPORT

)

Institute of Intelligent Information Processing, Taizhou University, Taizhou, China [email protected]

Abstract. Speech emotion recognition is an interesting and challenging subject due to the emotion gap between speech signals and high-level speech emotion. To bridge this gap, this paper present a method of Chinese speech emotion recog‐ nition using Deep belief networks (DBN). DBN is used to perform unsupervised feature learning on the extracted low-level acoustic features. Then, Multi-layer Perceptron (MLP) is initialized in terms of the learning results of hidden layer of DBN, and employed for Chinese speech emotion classiﬁcation. Experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD), show that the presented method obtains a classiﬁcation accuracy of 32.80 % and macro average precision of 41.54 % on the testing data from the CHEAVD dataset on speech emotion recognition tasks, signiﬁcantly outperforming the baseline results provided by the organizers in the speech emotion recognition sub-chal‐ lenges. Keywords: Deep learning · Deep belief networks · Speech emotion recognition · Feature learning

1

Introduction

During the past two decades, massive eﬀorts have been made to recognize human emotions from emotional speech signals, i.e., called speech emotion recognition. At present, speech emotion recognition has attracted much interest in various ﬁelds such as signal processing, pattern recognition, artiﬁcial intelligence, etc., since it can be applied to human-machine interactions [1, 2]. Feature extraction is a critical step to bridge the emotion gap between speech signals and high-level speech emotion. Up to now, a variety of features have been employed for speech emotion recognition [3, 4]. These features can be roughly divided into four categories: (1) acoustic features, such as prosody features, voice quality features as well as spectral features, (2) language features, such as lexical information, (3) context infor‐ mation such as subject, gender, culture inﬂuences, (4) hybrid features such as the inte‐ gration of two or three features abovementioned. However, these hand-designed features, there is no agreement that which is the best one suﬃciently and eﬃciently characterizing emotion in speech signals. In addition, these hand-designed features were low-level, hence may not be reliable enough to eﬃciently characterize the subjective

© Springer Nature Singapore Pte Ltd. 2016 T. Tan et al. (Eds.): CCPR 2016, Part II, CCIS 663, pp. 645–651, 2016. DOI: 10.1007/978-981-10-3005-5_53

646

S. Zhang et al.

emotion in complicated scenarios. It is thus important to develop automatic feature learning algorithms for speech emotion recognition. In recent years, deep learning [5], which is multi-layered with a deep architecture, has attracted extensive attentions in machine learning, signal processing, artiﬁcial intel‐ ligence and pattern recognition. Deep belief networks (DBN) [6], as a representative method of deep learning, exhibits a strong ability of unsupervised feature learning. In rec

Data Loading...

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Recommend Documents

Deep Residual Local Feature Learning for Speech Emotion Recognition

Correction to: Emotion recognition of speech signal using Taylor series and deep belief network based classification

Speech and Facial Based Emotion Recognition Using Deep Learning Approaches

Emotion Recognition in Speech with Deep Learning Architectures

Pattern recognition and features selection for speech emotion recognition model using deep learning

Deep Learning for NLP and Speech Recognition

Deep Cross-Species Feature Learning for Animal Face Recognition via Residual Interspecies Equivariant Network

Deep learning controller design of embedded control system for maglev train via deep belief network algorithm

Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition

Deep Sparse Autoencoder Network for Facial Emotion Recognition

Multi-features Integration for Speech Emotion Recognition

Speech Emotion Recognition UsingConvolutional Neural Network and Long-Short TermMemory