A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

PDF / 1,647,353 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
57 Downloads / 275 Views

ORIGINAL ARTICLE

A novel BNMF‑DNN based speech reconstruction method for speech quality evaluation under complex environments Weili Zhou1 · Zhen Zhu1 Received: 29 September 2019 / Accepted: 20 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Speech quality evaluation (SQE) under complex noisy environment is important for audio processing systems and quality of service. Recently, the non-intrusive SQE is getting more and more attentive due to its efficient and ease of use. However, non-intrusive SQEs are expected to be underperformed the intrusive ones since it has no prior knowledge of the clean speech. In this paper, a novel quasi-clean speech reconstruction method for non-intrusive SQE is proposed. The method incorporates Bayesian NMF (BNMF) with deep neural network (DNN), which takes the advantages of both NMF and DNN. BNMF is utilized to calculate the basic spectro-temporal matrixes of target speech, and the obtained matrices are integrated into the DNN model as an individual layer. Then DNN is trained to learn the complex mapping between the target source and the mixture signal, and reconstruct the magnitude spectrograms of the quasi-clean speech. Finally, the reconstructed speech is regarded as the reference of the perceptual model to estimate the Mean opinion score of the tested noisy sample. The experiment results show that the proposed method outperforms the comparative non-intrusive SQE algorithms under challenging conditions in terms of objective measurement. Keywords Speech quality evaluation · Non-intrusive · Bayesian NMF · Deep neural network

1 Introduction Speech quality evaluation (SQE) is important for audio processing systems and quality of service (QoS). The main reason is that users can reasonably choose the relatively highquality service providers based on speech quality. To this end, the communication providers need high-accuracy, better-performing SQE methods to automate assess the speech quality automatically, thereby reducing costs, optimizing and maintaining the network, and providing better service to customers [1]. Actual voice communication is often in a complex environment, such as stations, airports, restaurants, cars, street noise, and channel noise that may occur during communication. These persistent complex environmental noises greatly affect people’s communication. Therefore, efficient SQE methods under complex environments are often used in mobile communication and short-wave communication systems, and have become one of the research * Weili Zhou [email protected] 1

School of Electronic and Information Engineering, Foshan University, Foshan, People’s Republic of China

directions with broad prospects [2]. Besides communication, the high-tech medical electronic instruments also benefit from SQE, such as the hearing aids (HAs) devices. SQE of hearing aids is an important factor affecting the acceptance of the customers, and is of great interest to audiologists, hearing researchers and device manufacturers [3].

1.1 Related work in SQE S

Data Loading...

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Recommend Documents

A Novel Isolated Speech Recognition Method Based on Neural Network

Neural Correlates of Quality Perception for Complex Speech Signals

A Speech-to-Speech Translation based Interface for Tourism

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Speech Processing in Mobile Environments

Fourier-Lapped Multilayer Perceptron Method for Speech Quality Assessment

Speech Synthesis Method Based on Tacotron + WaveNet

Speech-to-Speech Translation

Novel Techniques for Dialectal Arabic Speech Recognition

Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis

Speech

Speech Production and Speech Modelling