An efficient approach for detecting vowel onset and offset points in speech signal
- PDF / 2,188,751 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 4 Downloads / 206 Views
An efficient approach for detecting vowel onset and offset points in speech signal Sarmila Garnaik1 · Avinash Kumar2 · Gayadhar Pradhan2 · Kabiraj Sethi3 Received: 7 October 2018 / Accepted: 10 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Vowel onset point (VOP) and vowel end point (VEP) are the instants of starting and ending of a vowel, respectively. VOPs and VEPs are equally important for accurate detection of vowels and development of different speech based applications. In a single algorithm, simultaneously detecting VOPs and VEPs is very challenging. In this paper, an efficient approach is proposed for robustly extracting the magnitude dynamics at each time instant of the speech signal. The mean and variance of the magnitude dynamics over an analysis frame happen to be significantly higher for the vowels when compared to other nonvowel, silence and noise regions. In this study, the average magnitude dynamics (AMD) over an analysis frame is used as the front-end feature. The AMD values at each time instant are then nonlinearly mapped (NL-AMD) by using sigmoidal function to sharpen the transitions at the VEPs and suppress the variations in the higher magnitude regions. The NL-AMD is equally discriminative at the VOPs and the VEPs. Consequently, most of the VOPs and the VEPs are detected within a smaller deviation. The experimental evaluations presented in this study show that, for the clean as well as noisy test conditions, the proposed feature outperforms the earlier reported front-end features for the task of detecting the VOPs and the VEPs. Keywords Signal dynamics · Nonlinear mapping · Vowel · VOP · VEP
1 Introduction Vowels are the dominant voice regions in a speech utterance. The instants of starting and ending of a vowel are known as vowel onset point (VOP) and vowel end point (VEP), respectively (Prasanna et al. 2009; Yadav and Rao 2013). The frequency response of the vocal-tract system as well * Avinash Kumar [email protected] Sarmila Garnaik [email protected] Gayadhar Pradhan [email protected] Kabiraj Sethi [email protected] 1
Department of Electrical and Electronics Engineering, Veer Surendra Sai University of Technology, Odisha, India
2
Department of Electronics and Communication Engineering, National Institute of Technology Patna, Patna, India
3
Department of Electronics and Telecommunication Engineering, Veer Surendra Sai University of Technology, Odisha, India
as the excitation source information are better manifested within the vowels (Prasanna and Pradhan 2011, 2013). In the earlier reported works, VOPs/vowels were used for building effective speaker recognition systems (Pradhan and Prasanna 2011, 2013; Almaadeed et al. 2015; Fakotakis et al. 1993; Daqrouq and Tutunji 2015). The knowledge of VOPs/vowels has also been explored for the detection of consonant-vowel units (Vuppala et al. 2012b, 2011), keyword spotting (Reddy et al. 2008), speech segmentation (Panda and Nayak 2016), dialect classification (Themistocleous 2017),
Data Loading...