Gain Adapted Optimum Mixture Estimation Scheme for Single Channel Speech Separation

  • PDF / 1,238,169 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 0 Downloads / 187 Views

DOWNLOAD

REPORT


Gain Adapted Optimum Mixture Estimation Scheme for Single Channel Speech Separation Divneet Singh Kapoor · Amit Kumar Kohli

Received: 13 May 2012 / Revised: 5 February 2013 © Springer Science+Business Media New York 2013

Abstract This paper presents the proof of an Optimum mixture estimator for the single channel speech separation problem, which is a technique for separating two speech signals from a single recording of their mixture. The presented work is an attempt to solve a fundamental limitation in the current single channel speech separation techniques, in which it is assumed that the data used in the training as well as test phases of the separation model have the same energy levels. To overcome this limitation, a gain adapted Optimum mixture estimator is derived, which estimates the mixture of speech signals under the different signal-to-signal ratios (SSRs). Specifically, the speakers’ gains are incorporated as unknown parameters into the separation model, and then the estimator is derived in terms of the source distributions and SSR. It is demonstrated that the use of the Optimum mixture estimator results in the lower estimation error than the non-linear mapping (log and inverse-log operations)based Mixture-Maximization (MixMax) or Quadratic estimators. The experimental results based on the real speech data also depict that the proposed estimator improves the mixture estimation performance significantly when compared with MixMax or Quadratic estimators with the gain adaptation. Keywords Single channel speech separation (SCSS) · Optimum mixture estimator · Mixture-maximization (MixMax) · Quadratic estimator · Gain adaptation

D.S. Kapoor Department of Electronics and Communication Engineering, Chandigarh Group of Colleges, Gharuan, Mohali, India e-mail: [email protected] A.K. Kohli () Department of Electronics and Communication Engineering, Thapar University, Patiala 147004, Punjab, India e-mail: [email protected]

Circuits Syst Signal Process

1 Introduction In recent years, great interest to solve open problems in the field of speech processing has emerged along with the development of advanced machine learning [10]. One such problem is the single channel speech separation (SCSS), in which the goal is to estimate the underlying speech signals Xi (t) from the observed mixture signal Y (t), which for the two-speaker case is expressed as Y (t) = X1 (t) + X2 (t). The process of SCSS consists of three main stages: Analysis, Separation and Reconstruction. The central separation stage represents the heart of system, in which the target speech is separated from the interfering speech. Since the separation process works on one segment of the speech at a time, it is crucial to accurately classify each segment into single or multi-speaker before the separation. The precise estimation of each speaker’s speech model parameters is another important task in the analysis stage. The speech signal of the desired speaker is finally synthesized from its estimated parameters in the reconstruction stag