Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation
- PDF / 1,388,457 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 79 Downloads / 215 Views
ORIGINAL PAPER
Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation Linhui Sun1
· Ge Zhu1 · Pingan Li1
Received: 9 October 2019 / Revised: 5 February 2020 / Accepted: 18 March 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract Single-channel speech separation (SCSS) plays an important role in speech processing. It is an underdetermined problem since several signals need to be recovered from one channel, which is more difficult to solve. To achieve SCSS more effectively, we propose a new cost function. What’s more, a joint constraint algorithm based on this function is used to separate mixed speech signals, which aims to separate two sources at the same time accurately. The joint constraint algorithm not only penalizes residual sum of square, but also exploits the joint relationship between the outputs to train the dual output DNN. In these joint constraints, the training accuracy of the separation model can be further increased. We evaluate the proposed algorithm performance on the GRID corpus. The experimental results show that the new algorithm can obtain better speech intelligibility compared to the basic cost function. In the aspects of source-to-distortion ratio , signal-to-interference ratio, source-to-artifact ratio and perceptual evaluation of speech quality, the novel approach can obtain better performance. Keywords Deep neural network (DNN) · Single-channel speech separation · Joint constraint · Cost function · Dual outputs
1 Introduction Single-channel speech separation (SCSS) is the process of separating multiple sources from one channel, which has a wide range of applications in automatic speech recognition (ASR), hearing aids and speaker recognition [1–4].Because of the excellent ability to model the nonlinear relationship between input features and output targets, deep neural network (DNN) has been widely used in the field of speech separations [5–14]. According to the number of DNN outputs, the DNN-based methods can be divided into two categories: single-output DNN and multi-output DNN. DNN with single output can be used to map the relationship between the mixed signal and the single target source. For example, Han et al. used DNN with single output to directly learn the nonlinear
B
Linhui Sun [email protected] Ge Zhu [email protected] Pingan Li [email protected]
1
College of Telecommunications & Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China
relationship of magnitude spectrogram between the reverberation and the clean signal, which achieved performance improvement in terms of denoising and de-reverberation [6]. Sun et al. proposed a two-stage method to address monaural source separation problem with the help of single output DNN [7]. This type of DNN maps a specific signal and has a significant separation performance. However, the single output DNN can only separate one voice source at a time, which is time consuming. For the multi-output DNN,
Data Loading...