An efficient voice activity detection algorithm by combining statistical model and energy detection

PDF / 394,802 Bytes
10 Pages / 595.276 x 793.701 pts Page_size
36 Downloads / 258 Views

RESEARCH

Open Access

An efficient voice activity detection algorithm by combining statistical model and energy detection Ji Wu* and Xiao-Lei Zhang

Abstract In this article, we present a new voice activity detection (VAD) algorithm that is based on statistical models and empirical rule-based energy detection algorithm. Specifically, it needs two steps to separate speech segments from background noise. For the first step, the VAD detects possible speech endpoints efficiently using the empirical rulebased energy detection algorithm. However, the possible endpoints are not accurate enough when the signal-tonoise ratio is low. Therefore, for the second step, we propose a new gaussian mixture model-based multipleobservation log likelihood ratio algorithm to align the endpoints to their optimal positions. Several experiments are conducted to evaluate the proposed VAD on both accuracy and efficiency. The results show that it could achieve better performance than the six referenced VADs in various noise scenarios. Keywords: energy detection, gaussian mixture model (GMM), multiple-observation, voice activity detection (VAD)

1 Introduction Voice activity detector (VAD) segregates speeches from background noise. It finds diverse applications in many modern speech communication systems, such as speech recognition, speech coding, noisy speech enhancement, mobile telephony, and very small aperture terminals. During the past few decades, researchers have tried many approaches to improve the VAD performance. Traditional approaches include energy in time domain [1,2], pitch detection [3], and zero-crossing rate [2,4]. Recently, several spectral energy-based new features were proposed, including energy-entropy feature [5], spacial signal correlation [6], cepstral feature [7], higherorder statistics [8,9], teager energy [10], spectral divergence [11], etc. Multi-band technique, which utilized the band differences between the speech and the noise, was also employed to construct the features [12,13]. Meanwhile, statistical models have attracted much attention. Most of them were focused on finding a suitable model to simulate the empirical distribution of the speech. Sohn [14] assumed that the speech and noise signals in discrete Fourier transform (DFT) domain were independent gaussian distribution. Gazor [15] * Correspondence: [email protected] Department of Electronic Engineering, Multimedia Signal and Intelligent, Information Processing Laboratory, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China

further used Laplace distribution to model the speech signals. Chang [16] analyzed the Gaussian, Laplace, and Gamma distributions in DFT domain and integrated them with goodness-of-fit test. Tahmasbi [17] supposed speech process, which was transformed by GARCH filter, having a variance gamma distribution. Ramirez [18] proposed the multiple-observation likelihood ratio test instead of the single frame LRT [14], which improved the VAD performance greatly. More recently, many mac

Data Loading...

An efficient voice activity detection algorithm by combining statistical model and energy detection

Recommend Documents

A novel voice activity detection algorithm using modified global thresholding

A novel voice activity detection based on phoneme recognition using statistical model

Lightweight CNN for Robust Voice Activity Detection

Voice-Activity and Overlapped Speech Detection Using x-Vectors

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Efficient fall activity recognition by combining shape and motion features

An Efficient Probability of Detection Model for Wireless Sensor Networks

Towards robust voice pathology detection

Saliency Region Detection via Graph Model and Statistical Learning

An efficient statistical model checker for nondeterminism and rare events

Spectral Content Characterization for Efficient Image Detection Algorithm Design