Multi-objective long-short term memory recurrent neural networks for speech enhancement

PDF / 3,725,873 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
92 Downloads / 223 Views

ORIGINAL RESEARCH

Multi‑objective long‑short term memory recurrent neural networks for speech enhancement Nasir Saleem1,2 · Muhammad Irfan Khattak1 · Mu’ath Al‑Hasan3 · Atif Jan1 Received: 25 July 2020 / Accepted: 3 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Speech-in-noise perception is an important research problem in many real-world multimedia applications. The noise-reduction methods contributed significantly; however rely on a priori information about the noise signals. Deep learning approaches are developed for enhancing the speech signals in nonstationary noisy backgrounds and their benefits are evaluated for the perceived speech quality and intelligibility. In this paper, a multi-objective speech enhancement based on the Long-Short Term Memory (LSTM) recurrent neural network (RNN) is proposed to simultaneously estimate the magnitude and phase spectra of clean speech. During training, the noisy phase spectrum is incorporated as a target and the unstructured phase spectrum is transformed to its derivative that has an identical structure to corresponding magnitude spectrum. Critical Band Importance Functions (CBIFs) are used in training process to further improve the network performance. The results verified that the proposed multi-objective LSTM (MO-LSTM) successfully outscored the standard magnitude-aware LSTM (MA-LSTM), magnitude-aware DNN (MA-DNN), phase-aware DNN (PA-DNN), magnitude-aware GNN (MA-GNN) and magnitude-aware CNN (MA-CNN). Moreover, the proposed speech enhancement considerably improved the speech quality, intelligibility, noise-reduction and automatic speech recognition in changing noisy backgrounds, which is confirmed by the ANalysis Of VAriance (ANOVA) statistical analysis. Keywords Speech enhancement · LSTM · DNN · ASR · RNN · Intelligibility · Speech quality

1 Introduction Speech enhancement aims to restore a clean speech from noisy speech. In conventional speech enhancement algorithms (Boll 1979; Cohen and Berdugo 2001; Ephraim and Van Trees 1995; Ephraim and Malah 1985; Saleem et al. 2019a, b, c, d; Saleem and Irfan 2018; Shoba and Rajavel 2020; Zao et al. 2014) such restoration is based on the unsupervised mathematical hypothesis about speech or noise signals. These algorithms often import musical noise artifacts which limit the performance of the speech enhancement. * Nasir Saleem [email protected] 1

Department of Electrical Engineering, University of Engineering and Technology, Peshawar 25000, KPK, Pakistan

2

Department of Electrical Engineering, FET, Gomal University, Dera Ismail Khan 29050, KPK, Pakistan

3

College of Engineering, Al Ain University, Al Ain, United Arab Emirates

The supervised machine learning speech enhancement approaches have demonstrated remarkable potential of improving the quality and intelligibility of noisy speech. Non-negative matrix factorization (NMF) (Kwon et al. 2014) presents one recognizable example of machine learning approach where speech and noise bases functions are acquired indepe

Data Loading...

Multi-objective long-short term memory recurrent neural networks for speech enhancement

Recommend Documents

Recurrent Neural Networks for Short-Term Load Forecasting An Overvie

Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Locally Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Evolving Recurrent Neural Networks for Pattern Classification

Generating Adversarial Texts for Recurrent Neural Networks

Sparse Bayesian Recurrent Neural Networks

Pruning Long Short Term Memory Networks and Convolutional Neural Networks for Music Emotion Recognition

FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks

Convergence Analysis of Recurrent Neural Networks