Toward Improving the Performance of Epoch Extraction from Telephonic Speech

  • PDF / 1,032,050 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 94 Downloads / 249 Views

DOWNLOAD

REPORT


Toward Improving the Performance of Epoch Extraction from Telephonic Speech Krishna Gurugubelli1 Anil Kumar Vuppala1

· Mohammad Hashim Javid1 · K. N. R. K. Raju Alluri1 ·

Received: 7 February 2020 / Revised: 10 September 2020 / Accepted: 13 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Epoch is an abrupt closure event within a glottal cycle at which significant excitation to the vocal-tract system happens during the production of voiced speech. The state-ofthe-art zero frequency filtering technique is a simple and efficient method that shows robustness in extracting the epochs from clean speech. However, this method has shown poor performance for telephonic quality speech, due to the presence of spurious zero crossings in epoch evidence, which leads to a high false alarm rate. Recently, zerophase zero frequency resonator (ZP-ZFR) an alternative to zero frequency filter is proposed for stable implementation of zero frequency filtering technique. In this study, higher-order ZP-ZFR is investigated to improve the performance of zero frequency filtering for epoch extraction from telephonic speech. The performance of the proposed ZP-ZFR method is quantitatively evaluated on telephonic speech simulated using six standard databases having simultaneous electroglottograph recordings as ground truth. Experimental results suggest that the performance of the proposed method is significantly better than the state-of-the-art methods in terms of identification rate and false alarm rate. Keywords Epoch extraction · Telephonic speech · Zero-phase zero frequency filtering

B

Krishna Gurugubelli [email protected] Mohammad Hashim Javid [email protected] K. N. R. K. Alluri [email protected] Anil Kumar Vuppala [email protected]

1

Speech Processing Laboratory, LTRC, KCIS, International Institute of Information Technology, Hyderabad 500032, India

Circuits, Systems, and Signal Processing

1 Introduction Epoch is a location within a glottal cycle at which significant excitation of the vocal tract system happens due to the abrupt closure of the vocal folds during speech phonation. The epoch is also referred to as glottal closure instant (GCI) [3]. Accurate and robust detection of epochs from speech signal is useful in glottal source analysis [2,8], accurate estimation of vocal tract system information [1], prosody modification [22], emotional speech analysis [19], text-to-speech synthesis [4], pathological speech analysis [9], etc. The performance of epoch extraction methods depends on how effectively the algorithms attenuate higher-order harmonics in speech signal. To accomplish this, the majority of epoch extraction methods utilize the linear prediction (LP) analysis and smoothing techniques. In the literature, LP-analysis is investigated for extracting GCIs from speech using either LP-residual or glottal flow derivative as an epoch evidence, in which the higher harmonics are suppressed through inverse filtering. The performance o