Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network

PDF / 1,675,179 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
13 Downloads / 186 Views

Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network Shanfa Ke 1,2 & Ruimin Hu 1,2 & Xiaochen Wang 1,2 & Tingzhao Wu 1,3 & Gang Li 1,3 & Zhongyuan Wang 1,3 Received: 15 September 2019 / Revised: 15 June 2020 / Accepted: 21 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

The recently-proposed deep clustering-based algorithms represent a fundamental advance towards the single-channel multi-speaker speech sep- aration problem. These methods use an ideal binary mask to construct the objective function and K-means clustering method to estimate the ideal bina- ry mask. However, when sources belong to the same class or the number of sources is large, the assumption that one time-frequency unit of the mixture is dominated by only one source becomes weak, and the IBM-based separation causes spectral holes or aliasing. Instead, in our work, the quantized ideal ratio mask was proposed, the ideal ratio mask is quantized to have the output of the neural network with a limited number of possible values. Then the quan- tized ideal ratio mask is used to construct the objective function for the case of multi-source domination, to improve network performance. Furthermore, a network framework that combines a residual network, a recurring network, and a fully connected network was used for exploiting correlation information of frequency in our work. We evaluated our system on TIMIT dataset and show 1.6 dB SDR improvement over the previous state-of-the-art methods. Keywords Multi-speaker . Speech separation . Deep clustering . Quantized- IRM . Residual network

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11042-02009419-y) contains supplementary material, which is available to authorized users.

* Ruimin Hu [email protected]

1

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, China

2

Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan, University, Wuhan 430072, China

3

Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China

Multimedia Tools and Applications

1 Introduction Human being has an extraordinary ability to selectively attend to one speaker in the presence of other speakers and background noises, which is the so-called cocktail-party effect. But solving this cocktail party problem [5] has proven extremely challenging for computers. Speech separation allows computers to have this skill that recovers the interesting speech source signals from one or more observed mixture, it is an attractive research field and can be used for many applications, e.g. automatic speech recognition(ASR) [10, 26], speech enhancement [32] and hearing aid [28]. Driven by these applications, speech separation has been extensively studied over the past decades. One well-known approach is independent component analysis (ICA) [4, 16], which separated the mixture by estimating an unmixing matrix (t

Data Loading...

Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network

Recommend Documents

Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation

Single-channel blind source separation based on attentional generative adversarial network

Gain Adapted Optimum Mixture Estimation Scheme for Single Channel Speech Separation

Environmental Sound Recognition Based on Residual Network and Stacking Algorithm

Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

Residual Embedding Similarity-Based Network Selection for Predicting Brain Network Evolution Trajectory from a Single Ob

A Novel Isolated Speech Recognition Method Based on Neural Network

Vision-Referential Speech Enhancement with Binary Mask and Spectral Subtraction

Channel Parameters Extraction Based on Back Propagation Neural Network

Single-Trace Side-Channel Attacks on Masked Lattice-Based Encryption

Single-image rain removal using deep residual network

Single-image super-resolution with multilevel residual attention network