An Efficient VAD Based on a Generalized Gaussian PDF
The emerging applications of wireless speech communication are demanding increasing levels of performance in noise adverse environments together with the design of high response rate speech processing systems. This is a serious obstacle to meet the demand
- PDF / 668,133 Bytes
- 9 Pages / 430 x 660 pts Page_size
- 14 Downloads / 234 Views
act. The emerging applications of wireless speech communication are demanding increasing levels of performance in noise adverse environments together with the design of high response rate speech processing systems. This is a serious obstacle to meet the demands of modern applications and therefore these systems often needs a noise reduction algorithm working in combination with a precise voice activity detector (VAD). This paper presents a new voice activity detector (VAD) for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm defines an optimum likelihood ratio test (LRT) involving Multiple and correlated Observations (MCO). An analysis of the methodology for N = {2, 3} shows the robustness of the proposed approach by means of a clear reduction of the classification error as the number of observations is increased. The algorithm is also compared to different VAD methods including the G.729, AMR and AFE standards, as well as recently reported algorithms showing a sustained advantage in speech/non-speech detection accuracy and speech recognition performance.
1
Introduction
The emerging applications of speech communication are demanding increasing levels of performance in noise adverse environments. Examples of such systems are the new voice services including discontinuous speech transmission [1,2,3] or distributed speech recognition (DSR) over wireless and IP networks [4]. These systems often require a noise reduction scheme working in combination with a precise voice activity detector (VAD) [5] for estimating the noise spectrum during non-speech periods in order to compensate its harmful effect on the speech signal. During the last decade numerous researchers have studied different strategies for detecting speech in noise and the influence of the VAD on the performance of speech processing systems [5]. Sohn et al. [6] proposed a robust VAD algorithm based on a statistical likelihood ratio test (LRT) involving a single observation vector. Later, Cho et al [7] suggested an improvement based on a smoothed LRT. Most VADs in use today normally consider hangover algorithms based on empirical models to smooth the VAD decision. It has been shown recently [8,9] that incorporating long-term speech information to the decision rule reports M. Chetouani et al. (Eds.): NOLISP 2007, LNAI 4885, pp. 246–254, 2007. c Springer-Verlag Berlin Heidelberg 2007
An Efficient VAD Based on a Generalized Gaussian PDF
247
benefits for speech/pause discrimination in high noise environments, however an important assumption made on these previous works has to be revised: the independence of overlapped observations. In this work we propose a more realistic one: the observations are jointly gaussian distributed with non-zero correlations. In addition, important issues that need to be addressed are: i) the increased computational complexity mainly due to the definition of the decision rule over large data sets, and ii) the optimum criterion of the decision rule. This work advanc
Data Loading...