Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

  • PDF / 3,187,082 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 7 Downloads / 277 Views

DOWNLOAD

REPORT


Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization Md. Imran Hossain1 · Md. Shohidul Islam2 · Mst. Titasa Khatun3 · Rizwan Ullah1 · Asim Masood1 · Zhongfu Ye1 Received: 2 December 2019 / Revised: 29 September 2020 / Accepted: 6 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this article, we propose a new source separation method in which the dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) algorithms are used sequentially as dual transforms and sparse nonnegative matrix factorization (SNMF) is used to factorize the magnitude spectrum. STFT-based source separation faces issues related to time and frequency resolution because it cannot exactly determine which frequencies exist at what time. Discrete wavelet transform (DWT)-based source separation faces a time-variation-related problem (i.e., a small shift in the timedomain signal causes significant variation in the energy of the wavelet coefficients). To address these issues, we utilize the DTCWT, which comprises two-level trees with different sets of filters and provides additional information for analysis and approximate shift invariance; these properties enable the perfect reconstruction of the time-domain signal. Thus, the time-domain signal is transformed into a set of subband signals in which low- and high-frequency components are isolated. Next, each subband is passed

B

Zhongfu Ye [email protected] Md. Imran Hossain [email protected] Md. Shohidul Islam [email protected] Mst. Titasa Khatun [email protected] Rizwan Ullah [email protected] Asim Masood [email protected]

1

National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230026, Anhui, China

2

Department of CSE, Islamic University, Kushtia 7003, Bangladesh

3

Department of ICE, Islamic University, Kushtia 7003, Bangladesh

Circuits, Systems, and Signal Processing

through the STFT and a complex spectrogram is constructed. Then, SNMF is applied to decompose the magnitude part into a weighted linear combination of the trained basis vectors for both sources. Finally, the estimated signals can be obtained through a subband binary ratio mask by applying the inverse STFT (ISTFT) and the inverse DTCWT (IDTCWT). The proposed method is examined on speech separation tasks utilizing the GRID audiovisual and TIMIT corpora. The experimental findings indicate that the proposed approach outperforms the existing methods. Keywords Speech separation (SS) · Dual-tree complex wavelet transform (DTCWT) · Sparse nonnegative matrix factorization (SNMF) · Short-time Fourier transform (STFT)

1 Introduction Source separation (SS) is a procedure for isolating a set of source signals from an observed or mixed signal. Single-channel SS (SCSS) has become important in many real-world applications, such as communication, multimedia, and the cocktail-party problem. Although devices for SCSS have many obvious p