On the Impact of Children's Emotional Speech on Acoustic and Language Models

PDF / 405,778 Bytes
14 Pages / 600.05 x 792 pts Page_size
82 Downloads / 191 Views

Research Article On the Impact of Children’s Emotional Speech on Acoustic and Language Models Stefan Steidl,1 Anton Batliner,1 Dino Seppi,2 and Bj¨orn Schuller3 1 Lehrstuhl

f¨ur Mustererkennung, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Martensstraße 3, 91058 Erlangen, Germany Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee (Leuven), Belgium 3 Institute for Human-Machine Communication, Technische Universit¨ at M¨unchen, Arcisstraße 21, 80333 M¨unchen, Germany 2 ESAT,

Correspondence should be addressed to Stefan Steidl, [email protected] Received 2 June 2009; Revised 9 October 2009; Accepted 23 November 2009 Academic Editor: Georg Stemmer Copyright © 2010 Stefan Steidl et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The automatic recognition of children’s speech is well known to be a challenge, and so is the influence of aﬀect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of both phenomena. Extensive test runs are carried out for 1 k vocabulary continuous speech recognition on spontaneous motherese, emphatic, and angry children’s speech as opposed to neutral speech. The experiments address the question how specific emotions influence word accuracy. In a first scenario, “emotional” speech recognisers are compared to a speech recogniser trained on neutral speech only. For this comparison, equal amounts of training data are used for each emotion-related state. In a second scenario, a “neutral” speech recogniser trained on large amounts of neutral speech is adapted by adding only some emotionally coloured data in the training process. The results show that emphatic and angry speech is recognised best—even better than neutral speech—and that the performance can be improved further by adaptation of the acoustic and linguistic models. In order to show the variability of emotional speech, we visualise the distribution of the four emotion-related states in the MFCC space by applying a Sammon transformation.

1. Introduction Oﬀering a broad variety of applications, such as literacy and reading tutors [1, 2], speech interfaces for children are an attractive subject of research [3]. However, automatic speech recognition (ASR) is known to be a challenge for the recognition of children’s speech [4–8]: characteristics of both acoustics and linguistics diﬀer from those of adults [9], for example, by higher pitch and formant positions or not yet perfectly developed coarticulation. At the same time, these strongly vary for children of diﬀerent ages due to anatomical and physiological development [10] and learning eﬀects. In [11], voice transformations are applied successfully to increase the performance for children’s speech if an adult speech recogniser is used. Apart from children’s speech, also aﬀective speech can be challenging f

Data Loading...

On the Impact of Children's Emotional Speech on Acoustic and Language Models

Recommend Documents

Performance Evaluation of Language Identification on Emotional Speech Corpus of Three Indian Languages

The Impact of Technology on Language Learning

Speech and Language Pathology

Speech and Language Therapy

Speech and Language Therapy

The impact of white matter hyperintensities on speech perception

Speech and Language

The Effects of Noise on Speech Recognition in Cochlear Implant Subjects: Predictions and Analysis Using Acoustic Models

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Speech, Language, and Communication Therapy

Translational Neuroscience of Speech and Language Disorders

The impact of emotional videos and emotional static faces on postural control through a personality trait approach