Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions

PDF / 661,035 Bytes
10 Pages / 600.03 x 792 pts Page_size
13 Downloads / 267 Views

Research Article Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions Weifeng Li,1 Kazuya Takeda,1 and Fumitada Itakura2 1 Graduate

School of Information Science, Nagoya University, Nagoya 464-8603, Japan of Information Engineering, Faculty of Science and Technology, Meijo University, Nagoya 468-8502, Japan

2 Department

Received 31 January 2006; Revised 10 August 2006; Accepted 29 October 2006 Recommended by S. Parthasarathy We address issues for improving handsfree speech recognition performance in diﬀerent car environments using a single distant microphone. In this paper, we propose a nonlinear multiple-regression-based enhancement method for in-car speech recognition. In order to develop a data-driven in-car recognition system, we develop an eﬀective algorithm for adapting the regression parameters to diﬀerent driving conditions. We also devise the model compensation scheme by synthesizing the training data using the optimal regression parameters and by selecting the optimal HMM for the test speech. Based on isolated word recognition experiments conducted in 15 real car environments, the proposed adaptive regression approach shows an advantage in average relative word error rate (WER) reductions of 52.5% and 14.8%, compared to original noisy speech and ETSI advanced front end, respectively. Copyright © 2007 Weifeng Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The mismatch between training and testing conditions is one of the most challenging and important problems in automatic speech recognition (ASR). This mismatch may be caused by a number of factors, such as background noise, speaker variation, a change in speaking styles, channel eﬀects, and so on. State-of-the-art ASR techniques for removing the mismatch usually fall into the following three categories [1]: robust features, speech enhancement, and model compensation. The first approach seeks parameterizations that are fundamentally immune to noise. The most widely used speech recognition features are the Mel-frequency cepstral coeﬃcients (MFCCs) [2]. MFCC’s lack of robustness in noisy or mismatched conditions has led many researchers to investigate robust variants or novel feature extraction algorithm. Some of these researches could be perceptually based on, for example, the PLP [3] and RASTA [4], while other approaches are related to the auditory processing, for example, gammatone filter [5] and EIH model [6]. Speech enhancement approach aims to perform noise reduction by transforming noisy speech (or feature) into an estimate that more closely resembles clean speech (or feature). Examples falling in this approach include spectral subtraction [7], Wiener filter, cepstral mean normal-

ization (CMN) [8], codeword-dependent cesptral normalization (CDCN) [9], and so on. Spectral subtraction was originally proposed in the context of the enhan

Data Loading...

Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions

Recommend Documents

Advanced Comb Filtering for Robust Speech Recognition

Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN

Robust Control Applied to Multiple-Input Multiple-Output Nonlinear Systems

Robust Adaptation to Non-Native Accents in Automatic Speech Recognition

A Robust Multimodal Speech Recognition Method using Optical Flow Analysis

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

New Era for Robust Speech Recognition Exploiting Deep Learning

Speech Recognition

A Novel Isolated Speech Recognition Method Based on Neural Network

Fisher Kernels on Phase-Based Features for Speech Emotion Recognition

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Geological big data acquisition based on speech recognition