A real-world noise removal with wavelet speech feature

  • PDF / 1,277,215 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 74 Downloads / 250 Views

DOWNLOAD

REPORT


A real‑world noise removal with wavelet speech feature Samba Raju Chiluveru1   · Manoj Tripathy1 Received: 24 January 2020 / Accepted: 12 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Real-world noise signals are non-stationary, and these signals are a mixture of more than one non-stationary noise signal. Most of the conventional speech enhancement algorithms (SEAs) focus primarily on a single noise corrupted speech signal, and it is far from real-world environments. In this article, we discuss speech enhancement in real-world environments with a new speech feature. The novelty of this article is three-fold, (1) The proposed model analyzed in real-world environments. (2) The proposed model uses a discrete wavelet transform (DWT) coefficients as input features. (3) The proposed Deep Denoising Autoencoder (DDAE) designed experimentally. The result of the proposed feature compares with conventional speech features like FFT-Amplitude, Log-Magnitude, Mel frequency cepstral coefficients (MFCCs), and the Gammatone filter cepstral coefficients (GFCCs). The performance of the proposed method compared with conventional speech enhancement methods. The enhanced signal evaluated with speech quality measures, like, perceptual evaluation speech quality (PESQ), weighted spectral slope (WSS), and Log-likelihood ratio (LLR). Similarly, speech intelligibility measured with short-time objective intelligibility (STOI). The results show that the proposed SEA model with the DWT feature improves quality and intelligibility in all real-world environmental Signal-to-Noise ratio (SNR) conditions. Keywords  Speech enhancement · Discrete wavelet transform coefficients · Deep denoising autoencoder · Short-time objective intelligibility · Perceptual evaluation speech quality

1 Introduction Speech enhancement is a significant research problem in audio signal processing. The goal is to improve the quality and Intelligibility of speech signals corrupted by noise. Two types of SEAs are reported in the literature viz. Multi-channel Speech Enhancement Algorithm (MSEA) and Singlechannel Speech Enhancement Algorithm (SSEA). Although the MSEA method can achieve good results in terms of intelligibility and quality (Hersbach et al. 2012; Spriet et al. 2007), a secondary microphone and a new headphone combination are required which introduces extra hardware cost and additional power consumption. Performance of MSEAs decreases in the reverberant, multi-talker environment, and

* Samba Raju Chiluveru [email protected] Manoj Tripathy [email protected] 1



its applicability had restricted to acoustic situations in which target and noise spatially separated (Chen et al. 2015). In a real-time environment, speech signals are recorded with a single microphone that is contaminated with background noise; therefore, SSEAs are aesthetically more interesting and economically more workable. Single-channel SEA is used to improve the quality and intelligibility of recorded speech signals in telecommunication terminal