VAD Based on Kernel Smoothed Function of EGARCH Models

  • PDF / 725,323 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 37 Downloads / 185 Views

DOWNLOAD

REPORT


VAD Based on Kernel Smoothed Function of EGARCH Models Usoph Hamdi Salemi · Sadegh Rezaei · Saralees Nadarajah

Published online: 25 January 2013 © Springer Science+Business Media New York 2013

Abstract An algorithm for a voice activity detector (VAD) is proposed. It is based on the exponential generalized autoregressive conditional heteroscedasticity (EGARCH) filter for generalized hyperbolic (GH), Gaussian random variables, adaptive threshold values and autocorrelation coefficients. EGARCH models are a new variation of GARCH models used especially in economic time series. A speech signal is assumed to have a GH because GH has heavier tails than the Gaussian distribution (GD) covering other heavy tailed distributions like hyperbolic, skewed t, variance gamma (VG), inverse Gaussian (NIG), Cauchy, Student’s t and Laplace distributions. The distribution of noise signal is assumed to be uncorrelated (white noise), but in general, that is not necessary. In the proposed method, heteroscedasticity is modeled by EGARCH. A kernel smoothed function of conditional variances and autocorrelations generate the soft detection vector. Finally, hard detection is the result of comparing the soft detection vector with an adaptive threshold value. The simulation results show that the proposed VAD is able to operate down to −5 dB. Keywords EGARCH models · GARCH models · Generalized hyperbolic distribution · Voice activity detection

1 Introduction Voice activity detection (VAD) refers to the ability to distinguish voice from noise. It is an integral part of a variety of speech communication systems, such as speech coding [23], speech recognition, audio conferencing [7], speech enhancement [22], wireless communication and echo cancelation. Speech/non-speech detection is an unsolved problem in speech processing and affects numerous applications including robust speech recognition. The speech/non-

U. H. Salemi · S. Rezaei Department of Statistics, Amirkabir University of Technology, Tehran, Iran S. Nadarajah (B) School of Mathematics, University of Manchester, Manchester, M13 9PL, UK e-mail: [email protected]

123

300

U. H. Salemi et al.

speech classification task is not as trivial as it appears, and most of the VAD algorithms fail when the level of background noise increases. During the last decade, various researchers have studied different strategies for detecting speech in noise and the influence of VAD on the performance of speech processing systems. Sohn et al. [23] proposed a robust VAD algorithm based on a statistical likelihood ratio test (LRT) involving a single observation vector. Later, [4] suggested an improvement based on a smoothed LRT. Most of the VADs that formulate the decision rule on a frame by frame basis normally use decision smoothing algorithms in order to improve robustness against noise. The speech/non-speech detection algorithm proposed in [21] assumes that the most significant information for detecting voice activity on a noisy speech signal remains on the time-varying signal spectrum mag