A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

PDF / 2,048,133 Bytes
21 Pages / 439.642 x 666.49 pts Page_size
88 Downloads / 231 Views

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman ﬁltering Hongjiang Yu1 · Wei-Ping Zhu1 · Zhiheng Ouyang1 · Benoit Champagne2 Received: 10 November 2019 / Revised: 22 June 2020 / Accepted: 6 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper, we propose a hybrid speech enhancement system that exploits deep neural network (DNN) for speech reconstruction and Kalman filtering for further denoising, with the aim to improve performance under unseen noise conditions. Firstly, two separate DNNs are trained to learn the mapping from noisy acoustic features to the clean speech magnitudes and line spectrum frequencies (LSFs), respectively. Then the estimated clean magnitudes are combined with the phase of the noisy speech to reconstruct the estimated clean speech, while the LSFs are converted to linear prediction coefficients (LPCs) to implement Kalman filtering. Finally, the reconstructed speech is Kalman-filtered for further removing the residual noises. The proposed hybrid system takes advantage of both the DNN based reconstruction and traditional Kalman filtering, and can work reliably in either matched or unmatched acoustic environments. Computer based experiments are conducted to evaluate the proposed hybrid system with comparison to traditional iterative Kalman filtering and several state-of-the-art DNN based methods under both seen and unseen noises. It is shown that compared to the DNN based methods, the hybrid system achieves similar performance under seen noise, but notably better performance under unseen noise, in terms of both speech quality and intelligibility. Keywords Speech enhancement · Deep neural network · Kalman filter · Unmatched acoustic environment

1 Introduction In real world environments, speech signals are often corrupted by a wide range of background noises. These disturbances cause problems in applications including voice communication, automatic speech recognition and speaker identification. As a result, speech enhancement, which aims to improve speech quality and intelligibility, has been intensively Hongjiang Yu

ho [email protected] 1

Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada

2

Department of Electrical and Computer Engineering, McGill University, Montreal, Canada

Multimedia Tools and Applications

studied over the past several decades, and will likely continue to be an active research topic in speech processing, recognition and communication. Various denoising methods have been proposed in the literature, among which statistical filtering received the earliest attention. Wiener filtering is one of the well-known methods in this category, with its goal to find the optimal minimum mean square error (MMSE) estimate of the clean speech’s discrete Fourier transform (DFT) coefficients [11]. Wiener filtering introduces broadband residual noise instead of musical noise in the enhanced speech, which is undesirable even though often acceptable. Kalman filt

Data Loading...

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Recommend Documents

DNN-based speech enhancement with self-attention on feature dimension

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Speech intelligibility enhancement: a hybrid wiener approach

Performance Improvement of Multi-Channel Speech Enhancement Using Modified Intelligent Kalman Filtering Algorithm

Fundamentals of Speech Enhancement

Speech Enhancement via EMD

Radial Basis Function Neural Network Based Speech Enhancement System Using SLANTLET Transform Through Hybrid Vector Wien

Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

Speech System

Blind Signal Separation with Speech Enhancement

A Research on Speech Enhancement Based on Hybrid Parallel Subbands HMM and Neural Network Model

Advanced Comb Filtering for Robust Speech Recognition