A voice activity detection algorithm in spectro-temporal domain using sparse representation

PDF / 2,015,551 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
25 Downloads / 175 Views

ORIGINAL ARTICLE

A voice activity detection algorithm in spectro-temporal domain using sparse representation Mohadese Eshaghi1 · Farbod Razzazi1 · Alireza Behrad2 Received: 28 November 2016 / Accepted: 11 July 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract This paper describes a new algorithm for voice activity detection (VAD), based on sparse representation of spectro-temporal domain. Our audio classification algorithm is based on multi-scale spectro-temporal modulation features which are extracted using auditory cortex model. The key concept in sparse representation is that any speech fragment can be represented as a linear combination of a small number of exemplar speech tokens. In this algorithm, the approach transforms the speech into spectro-temporal domain resulting in its decomposition into auditory-based features with multiple scales of temporal and spectral resolutions; in the next stage, each frame is divided into several sub-cubes in the new domain; then the algorithm detects the speech in the signal by using the sparse representation of sub-cubes of the frames in this domain. Simulation results are given to illustrate the effectiveness of our new VAD algorithms. The results reveal that the achieved performance is 90.11 and 91.75% under − 5 db SNR in white and car noise respectively, outperforming most of the state of the art VAD algorithms. Keywords Speech processing · Voice activity detector · VAD · Spectro-temporal domain representation · Sparse representation

1 Introduction Voice activity detection (VAD) which refers to the ability of distinguishing speech from environmental noise is an integral part of a variety of speech communication systems. Speech coding, speech recognition, hands-free telephony, and echo cancellation are some examples of these systems. For instance, in a GSM-based communication system, a VAD scheme is used to reduce battery power consumption through discontinuous transmission when speech-pause is detected [1]. Moreover, a VAD algorithm maybe used in

* Farbod Razzazi [email protected] Mohadese Eshaghi [email protected] Alireza Behrad [email protected] 1

Department of Electrical and Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran

Electrical and Electronic Engineering Department, Shahed University, Tehran, Iran

2

a variable bit rate speech encoding systems to control the average bit rate and the overall quality of speech encoding. VAD algorithms have a long rich history. For VADs under extreme noisy conditions, a considerable amount of research has been done [2–5]. Previously, Sohn et al. have presented a VAD algorithm with a noise spectrum adaptation by applying soft decision techniques [6]. The decision rule has been drawn from the generalized likelihood ratio test by assuming that the noise statistics are known as a priori. Cho et al. later have presented an improved version of Sohn algorithm [7]. Specifically, Cho has presented a smoothed likelihood ratio test to reduce the detection

Data Loading...

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Recommend Documents

A novel voice activity detection algorithm using modified global thresholding

Sparse Spectrotemporal Coding of Sounds

Sparse and collaborative representation-based anomaly detection

Voice-Activity and Overlapped Speech Detection Using x-Vectors

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Lightweight CNN for Robust Voice Activity Detection

An efficient voice activity detection algorithm by combining statistical model and energy detection

A Water-Level Measurement Method Using Sparse Representation

Object Detection Based on Sparse Representation of Foreground

A novel voice activity detection based on phoneme recognition using statistical model

Dimensionality Reduction and Sparse Representation

Layer-based sparse representation of multiview images