Power Spectral Density Error Analysis of Spectral Subtraction Type of Speech Enhancement Methods

  • PDF / 405,845 Bytes
  • 9 Pages / 600.03 x 792 pts Page_size
  • 31 Downloads / 243 Views

DOWNLOAD

REPORT


Research Article Power Spectral Density Error Analysis of Spectral Subtraction Type of Speech Enhancement Methods ¨ Peter Handel Signal Processing Lab, School of Electrical Engineering, Royal Institute of Technology, SE-100 44 Stockholm, Sweden Received 8 August 2005; Revised 28 April 2006; Accepted 16 July 2006 Recommended by Richard Heusdens A theoretical framework for analysis of speech enhancement algorithms is introduced for performance assessment of spectral subtraction type of methods. The quality of the enhanced speech is related to physical quantities of the speech and noise (such as stationarity time and spectral flatness), as well as to design variables of the noise suppressor. The derived theoretical results are compared with the outcome of subjective listening tests as well as successful design strategies, performed by independent research groups. Copyright © 2007 Peter H¨andel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The human speech generates complex acoustic waves that sometimes are aimed at a nearby listener, and sometimes are aimed for being transmitted by technical systems such as radio broadcasting, fixed or wireless telephony, or services based on the Internet Protocol. A fundamental feature of digital transmission schemes is that the recorded speech samples are coded (compressed) in order to loosen the bandwidth requirement of the transmission channel. The compression, or speech coding, is typically model-based and optimized for compression of speech signals. Since the encoder is designed for speech, it is not suitable for compression of other sources such as environmental noise or music. Accordingly, reduction of noise from received noise-contaminated speech samples is a problem of great importance. Digital noise reduction schemes are considered in several cellular systems. Early work includes the scheme employing a Kalman filter standardized for the pacific digital cellular system [1]. In the enhanced variable rate codec (EVRC) for CDMA mobile telephony systems, a frequency-domain noise reduction system is included [2]. In these telephony applications, it is important that the algorithms produce enhanced speech with marginal distortion. Based on experimental studies and listener tests, several research groups have independently reported improvement in signal-to-noise ratio (SNR) of order 10 dB, without introducing audible artifacts and distortion. Yang reported 9 dB

SNR improvement for a frequency-domain noise reduction algorithm [3], Gibson et al. reported figures near 7 dB [4], while S¨orqvist et al. reported a figure of 10 dB [5]. The latter methods employ time-domain Kalman filters. One should notice the different SNR measures and speech material used in the cited works, and thus a direct comparison of SNR figures is not suitable. Theoretical limits for speech enhancement were studied in [6], wher