Analysis of algorithms to estimate glottal closure instants from speech signals
- PDF / 4,409,621 Bytes
- 25 Pages / 595.276 x 790.866 pts Page_size
- 111 Downloads / 162 Views
Analysis of algorithms to estimate glottal closure instants from speech signals G. Anushiya Rachel1 · P. Vijayalakshmi2 · T. Nagarajan2 Received: 7 October 2019 / Accepted: 2 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Estimation of glottal closure instants (GCIs) plays a vital role in pitch-synchronous speech processing. The current work performs a qualitative and quantitative review of six existing GCI estimation algorithms, namely, group delay (GD)-based algorithm, DYPSA, YAGA, ZFF, SEDREAMS and DPI algorithm. This paper differs from existing review papers in that, a detailed analysis on the parameters affecting each algorithm is presented. The optimized set of parameters, derived from this analysis, is then used to perform a comparative analysis of the algorithms. Further, in addition to evaluating the performance of the algorithms on clean and noisy speech, performance on telephone speech is analyzed as well. The algorithms are also evaluated on pathological speech, to analyze their performance in the presence of pitch jitter. In terms of the identification rate, the DPI algorithm outperforms the other algorithms on clean speech, while SEDREAMS and ZFF are observed to be highly robust to noise. On telephone speech, however, DYPSA and GD-based algorithm exhibit superior performance. The GD algorithm also performs better than the other algorithms in the presence of pitch jitter. The algorithms are also evaluated in terms of the computation time, and ZFF is observed to be faster than the rest. Keywords Glottal closure instants · Epochs · Instants of excitation · Clean speech · Noisy speech · Telephone speech · Pathological speech
1 Introduction The estimation of glottal closure instants (GCIs) plays a vital role in pitch-synchronous speech processing. A popular algorithm that relies on the accurate estimate of GCIs or pitch marks is the time-domain pitch-synchronous overlap and add technique (TD-PSOLA), which is commonly used to modify the prosody of speech as in Rao (2012) and Anushiya Rachel et al. (2014, 2015). The estimation of GCIs can also aid in the detection and diagnosis of vocal fold pathologies. Another application is in the artificial bandwidth extension * G. Anushiya Rachel [email protected] P. Vijayalakshmi [email protected] T. Nagarajan [email protected] 1
Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India
Speech Lab, SSN College of Engineering, Chennai, India
2
of telephone speech (Thomas et al. 2010). Other applications that rely on the location of GCIs include speech dereverberation (Thomas et al. 2007; Gaubitch and Naylor 2007), glottal source modeling (Wong et al. 1979; Thomas et al. 2009), causal-anticausal deconvolution (Drugman et al. 2012), closed-phase inverse filtering (Gudnason et al. 2014), and speech synthesis (Stylianou 2001; Drugman et al. 2009). Predominantly, algorithms that estimate GCIs from speech signals, initially attempt
Data Loading...