Low-complexity disordered speech quality estimation
- PDF / 1,391,470 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 111 Downloads / 196 Views
Low‑complexity disordered speech quality estimation Yousef S. Ettomi Ali1 · Vijay Parsa1,2 · Phillip Doyle2 · Soulaimane Berkane3 Received: 11 June 2019 / Accepted: 11 February 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020, corrected publication 2020
Abstract Tracheoesophageal (TE) speech is generated by patients who have undergone a total laryngectomy where the larynx (voice box) is removed and replaced by a tracheoesophageal puncture. This work presents a novel low complexity algorithm to estimate the degree of severity of disordered TE speech. The proposed algorithm has two output scores which are computed from 20 ms voiced frames of the speech signal. An 18th order Linear Prediction (LP) analysis is performed on each voiced frame of the speech signal. The first output score uses features derived from high order statistics (mean, variance, skewness and kurtosis) which are calculated from the LP coefficients, the cepstral coefficients and the LP residual signal. These high order statistics (HOS) along with the pitch value are averaged over all voiced frames yielding a total of 14 HOS quality features. The second output score is derived from features derived from the estimated vocal tract model parameters (crosssectional tubes areas). Statistical vocal tract parameters (VTPs) across all voiced speech frames were used as speech quality features. Forward stepwise regression as well as K-fold cross validation are then used to select the best sets of features to be fed to the regression models. The results show high correlations with subjective scores for several regression techniques that can provide a correlation up to 0.91 when VTP-Gaussian model is used. Keywords Tracheoesohageal speech · Speech quality · Linear prediction · Vocal tract parameters
1 Introduction Voice and speech quality estimation is an important topic of research with many applications in telecommunication and biomedical engineering. Early algorithms that assesses voice and speech quality were developed in the telecommunication industry to evaluate the performance of telecommunication channels, the accuracy of speech coding algorithms * Yousef S. Ettomi Ali [email protected] Vijay Parsa [email protected] Phillip Doyle [email protected] Soulaimane Berkane [email protected] 1
Department of Electrical and Computer Engineering, University of Western Ontario, London, ON, Canada
2
School of Communications and Speech Disorders, University of Western Ontario, London, ON, Canada
3
Department of Computer Sciences and Engineering, University of Quebec in Outaouais, Gatineau, QC, Canada
and often the efficiency of speech enhancement methods (Union 1996; Rix et al. 2001; Malfait et al. 2006; Beerends et al. 2013). In the biomedical field, voice and speech quality estimation algorithms were developed to evaluate the severity of dysphonia (abnormality in the pereived quality of voice production) (Awan et al. 2010) and the associated voice quality of pathological speech (Parsa and Jamieson 2001; Ritchings et al. 2002; Gu et al
Data Loading...