Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance

  • PDF / 566,195 Bytes
  • 10 Pages / 600.05 x 792 pts Page_size
  • 99 Downloads / 162 Views

DOWNLOAD

REPORT


Research Article Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance Ravichander Vipperla, Steve Renals, and Joe Frankel The Center for Speech Technology Research, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK Correspondence should be addressed to Ravichander Vipperla, [email protected] Received 29 May 2009; Revised 10 November 2009; Accepted 4 January 2010 Academic Editor: Vijay Parsa Copyright © 2010 Ravichander Vipperla et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech Recognition (ASR) and found that the Word Error Rates (WER) on older voices is 10% absolute higher compared to those of adult voices. Subsequently, we compared several voice source parameters including fundamental frequency, jitter, shimmer, harmonicity, and cepstral peak prominence of adult and older males. Several of these parameters show statistically significant difference for the two groups. However, artificially increasing jitter and shimmer measures do not effect the ASR accuracies significantly. Artificially lowering the fundamental frequency degrades the ASR performance marginally but this drop in performance can be overcome to some extent using Vocal Tract Length Normalisation (VTLN). Overall, we observe that the changes in the voice source parameters do not have a significant impact on ASR performance. Comparison of the likelihood scores of all the phonemes for the two age groups show that there is a systematic mismatch in the acoustic space of the two age groups. Comparison of the phoneme recognition rates show that mid vowels, nasals, and phonemes that depend on the ability to create constrictions with tongue tip for articulation are more affected by ageing than other phonemes.

1. Introduction Older people form an important user group for a variety of spoken dialogue systems. Systems with speech-based interactions can be particularly useful for older people with mobility restrictions and visual impairment. One of the main challenges in developing such systems is to build Automatic Speech Recognition (ASR) systems that give good performance on older voices. With ageing, several changes occur in the human speech production mechanism consisting of the lungs, vocal cords, and the vocal cavities including the pharynx, mouth, and nose. In the respiratory system, loss of elasticity [1], stiffening of the thorax, reduction in respiratory muscle strength [2], and loss in the diaphragm strength [3] are the most significant changes. This leads to a reduction in forced expiratory volume and lung pressure in older people, as a result of which there is a decline in the amount of air that

moves in and out and the efficiency w