A voice activity detection algorithm in spectro-temporal domain using sparse representation

  • PDF / 2,015,551 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 25 Downloads / 179 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

A voice activity detection algorithm in spectro-temporal domain using sparse representation Mohadese Eshaghi1 · Farbod Razzazi1   · Alireza Behrad2 Received: 28 November 2016 / Accepted: 11 July 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract This paper describes a new algorithm for voice activity detection (VAD), based on sparse representation of spectro-temporal domain. Our audio classification algorithm is based on multi-scale spectro-temporal modulation features which are extracted using auditory cortex model. The key concept in sparse representation is that any speech fragment can be represented as a linear combination of a small number of exemplar speech tokens. In this algorithm, the approach transforms the speech into spectro-temporal domain resulting in its decomposition into auditory-based features with multiple scales of temporal and spectral resolutions; in the next stage, each frame is divided into several sub-cubes in the new domain; then the algorithm detects the speech in the signal by using the sparse representation of sub-cubes of the frames in this domain. Simulation results are given to illustrate the effectiveness of our new VAD algorithms. The results reveal that the achieved performance is 90.11 and 91.75% under − 5 db SNR in white and car noise respectively, outperforming most of the state of the art VAD algorithms. Keywords  Speech processing · Voice activity detector · VAD · Spectro-temporal domain representation · Sparse representation

1 Introduction Voice activity detection (VAD) which refers to the ability of distinguishing speech from environmental noise is an integral part of a variety of speech communication systems. Speech coding, speech recognition, hands-free telephony, and echo cancellation are some examples of these systems. For instance, in a GSM-based communication system, a VAD scheme is used to reduce battery power consumption through discontinuous transmission when speech-pause is detected [1]. Moreover, a VAD algorithm maybe used in

* Farbod Razzazi [email protected] Mohadese Eshaghi [email protected] Alireza Behrad [email protected] 1



Department of Electrical and Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran



Electrical and Electronic Engineering Department, Shahed University, Tehran, Iran

2

a variable bit rate speech encoding systems to control the average bit rate and the overall quality of speech encoding. VAD algorithms have a long rich history. For VADs under extreme noisy conditions, a considerable amount of research has been done [2–5]. Previously, Sohn et al. have presented a VAD algorithm with a noise spectrum adaptation by applying soft decision techniques [6]. The decision rule has been drawn from the generalized likelihood ratio test by assuming that the noise statistics are known as a priori. Cho et al. later have presented an improved version of Sohn algorithm [7]. Specifically, Cho has presented a smoothed likelihood ratio test to reduce the detection