Objective and Subjective Evaluation of an Expressive Speech Corpus

This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical feature

  • PDF / 353,628 Bytes
  • 9 Pages / 430 x 660 pts Page_size
  • 30 Downloads / 203 Views

DOWNLOAD

REPORT


bstract. This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationship between both objective and subjective evaluations is analyzed and the obtained conclusions can be useful to improve the following steps related to expressive speech synthesis.

1

Introduction

There is a growing tendency toward the use of speech in human-machine interaction by incorporating automatic speech recognition and speech synthesis. The recognition of emotional states or the synthesis of emotional speech can improve the communication by doing it more natural [1]. Therefore, one of the most important challenges in the study of the expressive speech is the development of oral corpora with authentic emotional content that enable robust analysis according to the task for which they are developed. It is not the objective of the present work to carry out an exhaustive summary of the available databases for the study of emotional speech, since recently, complete studies have appeared in the literature. In [2], a new compilation of 48 databases is presented showing a notable increase of multimodal databases. In [3], the databases used in 14 experiments of automatic detection of the emotion are summarized. Finally, in [4] a revision of 64 databases of emotional speech is done, providing a basic description of each one and its application. Section 2 introduces different aspects about expressive speech. Section 3 explains the production of our corpus. Section 4 details the process of the objective validation carried out using techniques of automatic emotion identification. Section 5 concerns subjective evaluation by means of a listening test, and finally, the conclusions are presented in Section 6. 

This work has been partially supported by the European Commission, project SALERO FP6 IST-4-027122-IP.

M. Chetouani et al. (Eds.): NOLISP 2007, LNAI 4885, pp. 86–94, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Objective and Subjective Evaluation of an Expressive Speech Corpus

2

87

Building Emotional Speech Corpora

According to [5], four aspects have to be considered for building emotional speech corpora: i) the scope (number, genre and age of speakers, language, dialects, and emotional states); ii) the context where an utterance takes place (emotional significance related to semantics, prosody, facial expression and gestures); iii) the descriptors that represent the linguistic, emotional and acoustic content of the speech; and iv) the naturalness, which will depend on the strategy followed to obtain the emotional speech. With respect to the latter, the main debate is centered on the compromise between authenticity and audio quality. Campbell [1] and Schr¨ oder [6] propose 4 emotional speech sources: Natural occurre