Noise effect on Amazigh digits in speech recognition system

  • PDF / 1,719,003 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 22 Downloads / 283 Views

DOWNLOAD

REPORT


Noise effect on Amazigh digits in speech recognition system Ouissam Zealouk1 · Hassan Satori1   · Naouar Laaidi1 · Mohamed Hamidi1 · Khalid Satori1 Received: 28 January 2020 / Accepted: 21 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Automatic Speech Recognition (ASR) for Amazigh speech, particularly Moroccan Tarifit accented speech, is a less researched area. This paper focuses on the analysis and evaluation of the first ten Amazigh digits in the noisy conditions from an ASR perspective based on Signal to Noise Ratio (SNR). Our testing experiments were performed under two types of noise and repeated with added environmental noise with various SNR ratios for each kind ranging from 5 to 45 dB. Different formalisms are used to develop a speaker independent Amazigh speech recognition, like Hidden Markov Model (HMMs), Gaussian Mixture Models (GMMs). The experimental results under noisy conditions show that degradation of performance was observed for all digits with different degrees and the rates under car noisy environment are decreased less than grinder conditions with the difference of 2.84% and 8.42% at SNR 5 dB and 25 dB, respectively. Also, we observed that the most affected digits are those which contain the "S" alphabet. Keywords  Automatic speech recognition system · Amazigh language · Hidden markov model · Sphinx4 · Noise

1 Introduction Speech Recognition is the process of converting a speech signal to a sequence of words based on algorithms. Recently, it has become more popular as an input mechanism in several computer applications. The Automatic Speech Recognition systems performance degrades considerably when speech is corrupted by background noise not seen during training where the reason is the observed speech signal does no longer match the distributions derived from the training material. There have been many approaches that aim at solving this mismatch, such as speech features normalization or improvement to remove the corrupting noise from the observations prior to recognition (Yu et al. (2008)), acoustic models compensation (Moreno et al. 1996; Gales and Young 1996) and using the recognizer architectures that use only the least noisy observations (Raj and Stern 2005). Lee et al. (2009) have combined enhancement of speech with endpoint detection and discrimination of the speech/non-speech

* Hassan Satori [email protected] 1



Laboratory Computer Science, Image Processing and Numerical Analysis, Faculty of Sciences Dhar Mahraz, Sidi Mohammed Ben Abbdallah University, B.P. 1796, Fez, Morocco

in a commercial application. Authors in (Kim and Stern (2009)) have presented a new noise robust frontend method and compared to different noise conditions. Model adaptation methods staying the observations unaltered and make updating the recognizer model parameters for giving a more observed speech representative, e.g. (Li et al. 2007; Hu et al. 2006; Seltzer et al. 2010). These approaches can be further enhanced by using different conditions training dat