Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach

  • PDF / 956,117 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 90 Downloads / 166 Views

DOWNLOAD

REPORT


Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach Hari Krishna Vydana1

· Anil Kumar Vuppala1

Received: 1 April 2019 / Revised: 14 October 2020 / Accepted: 17 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Fricatives are characterized by two prime acoustic properties, i.e., having highfrequency spectral concentration and possessing noisy nature. Spectral domain approaches for detecting fricatives employ a time–frequency representation to compute acoustic cues such as band energy ratio, spectral centroid, and dominant resonant frequency. The detection accuracy of these approaches depends on the efficiency of the employed time–frequency representation. An approach that would not require any time–frequency representation for detecting fricatives from speech has been explored in this work. In this study, a time-domain operation is proposed which emphasizes the high-frequency spectral characteristics of fricatives implicitly. The proposed approach aims to scale the spectrum of the speech signal using a scaling function k 2 , where k is the discrete frequency. The spectral weighting function used in the proposed approach can be approximated as a cascaded temporal difference operation over speech signal. The emphasized regions in spectrally weighted speech signal are quantified to detect fricative regions. Contrasting the spectral domain approaches, the predictability measure-based approach in literature relies on capturing the noisy nature of fricatives. The proposed approach and the predictability measure-based approaches rely on two complementary properties for detecting fricatives, and a combination of these approaches is put forth in this work. The proposed approach has performed better than the state-of-the-art fricative detectors. To study the significance of the proposed evidence, an early fusion between the proposed evidence and the feature-space maximum log-likelihood transform features is explored for developing speech recognition systems.

B

Hari Krishna Vydana [email protected] Anil Kumar Vuppala [email protected]

1

Speech Processing Laboratory, LTRC, International Institute of Information Technology, Hyderabad, India

Circuits, Systems, and Signal Processing

Keywords Fricative detection · Landmarks · Temporal approach · Spectral concentration · Scaling function · Cascaded temporal difference

1 Introduction Fricatives are produced by constricting the vocal tract along its length and rushing air with sufficient volume velocity so that a turbulence is generated down the constriction [24,25]. In a voiced fricative, both the events, i.e., frication and glottal closure instant exist mutually exclusively in a single glottal cycle [30]. Fricatives are characterized by two prime acoustic characteristics, viz., having majority of the spectral distribution concentrated above 3 kHz [25] and possessing the noisy nature [3]. Though there has been a significant work in detecting the place of articulation of fricatives [1,4,17,18], det