Experiments on Automatic Recognition of Nonnative Arabic Speech
- PDF / 281,664 Bytes
- 9 Pages / 600.05 x 792 pts Page_size
- 60 Downloads / 237 Views
Research Article Experiments on Automatic Recognition of Nonnative Arabic Speech Yousef Ajami Alotaibi,1 Sid-Ahmed Selouani,2 and Douglas O’Shaughnessy3 1 Computer
Engineering Department, King Saud University, Riyadh 11451, Saudi Arabia de Recherche en Interactivité Homme Système LARIHS, Université de Moncton, Campus de Shippagan, New Brunswick, Canada E8S 1P6 3 INRS-Energie-Matériaux-Télécommunications, Université du Québec, 800 de la Gauchetière Ouest, place Bonaventure, Montréal, Canada H5A 1K6 2 Laboratoire
Correspondence should be addressed to Sid-Ahmed Selouani, [email protected] Received 11 May 2007; Revised 5 October 2007; Accepted 13 January 2008 Recommended by Li Deng The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA). We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC) and the hidden Markov Model Toolkit (HTK) are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers. Copyright © 2008 Yousef Ajami Alotaibi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1.
INTRODUCTION
Pronunciation variability is by far the most critical issue for Arabic automatic speech recognition (AASR). This is mainly due to the large number of nonnative accents and to the fact that nonnative speech data available for training are generally insufficient. Hence the modeling of separate accents remains difficult and inaccurate. In addition, the Arabic language is characterized by an extreme dialectal variation and nonstandardized speech representations, since it is usually written without short vowels and other diacritics,
Data Loading...