Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach

  • PDF / 1,418,215 Bytes
  • 10 Pages / 600.03 x 792 pts Page_size
  • 106 Downloads / 194 Views

DOWNLOAD

REPORT


Research Article Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach A. Shahina1 and B. Yegnanarayana2 1 Department

of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600036, India Institute of Information Technology, Gachibowli, Hyderabad 500032, India

2 International

Received 4 October 2006; Accepted 25 March 2007 Recommended by Jiri Jan Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients) of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech. Copyright © 2007 A. Shahina and B. Yegnanarayana. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

Speech signal collected by a vibration pickup (called throat microphone) placed at the throat (near the glottis) is clean, but does not sound natural like a normal (close-speaking) microphone speech. Mapping the speech spectra from the throat microphone to the normal microphone aims at improving the perceptual quality of the slightly muffled and “metallic” speech from the throat microphone. This would reduce the discomfort arising due to prolonged listening to speech from a throat microphone in adverse situations as in cockpits of aircrafts, in the presence of intense noise of running engines at machine shops and engine rooms among others, where it is currently used. Mapping the speech spectra involves the following stages: the first stage consisting of training involves recording speech simultaneously using the throat microphone and normal microphone from a speaker. Simultaneous recording is essential for understanding the differences between components of speech in both signals and for training appropriate models to capture the mapping between the spectra of the two signals. Suitable speech features are extracted from the speech signals. During training, the feature vectors extracted from the throat microphone (TM) speech are mapped onto the

corresponding feature vectors extracted from the normal microphone (NM) speech. In the second stage consisting of testing, feature vectors corresponding to t