Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

PDF / 1,473,809 Bytes
12 Pages / 600 x 792 pts Page_size
108 Downloads / 238 Views

Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units T. Nagarajan Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600036, India Email: [email protected]

H. A. Murthy Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600036, India Email: [email protected] Received 16 January 2004; Revised 17 June 2004; Recommended for Publication by Chin-Hui Lee In the development of a syllable-centric automatic speech recognition (ASR) system, segmentation of the acoustic signal into syllabic units is an important stage. Although the short-term energy (STE) function contains useful information about syllable segment boundaries, it has to be processed before segment boundaries can be extracted. This paper presents a subband-based group delay approach to segment spontaneous speech into syllable-like units. This technique exploits the additive property of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection. By treating the STE function as a magnitude spectrum of an arbitrary signal, a minimum-phase group delay function is derived. This group delay function is found to be a better representative of the STE function for syllable boundary detection. Although the group delay function derived from the STE function of the speech signal contains segment boundaries, the boundaries are diﬃcult to determine in the context of long silences, semivowels, and fricatives. In this paper, these issues are specifically addressed and algorithms are developed to improve the segmentation performance. The speech signal is first passed through a bank of three filters, corresponding to three diﬀerent spectral bands. The STE functions of these signals are computed. Using these three STE functions, three minimum-phase group delay functions are derived. By combining the evidence derived from these group delay functions, the syllable boundaries are detected. Further, a multiresolutionbased technique is presented to overcome the problem of shift in segment boundaries during smoothing. Experiments carried out on the Switchboard and OGI-MLTS corpora show that the error in segmentation is at most 25 milliseconds for 67% and 76.6% of the syllable segments, respectively. Keywords and phrases: group delay, minimum-phase signal, syllable, subband-based segmentation.

1.

INTRODUCTION

One of the major reasons for considering the syllable as a basic unit for automatic speech recognition (ASR) systems is its better representational and durational stability relative to the phoneme [1]. The syllable was proposed as a unit for ASR as early as 1975 [2], in which irregularities in phonetic manifestations of phonemes were discussed. It was argued that the syllable will serve as an eﬀective minimal unit in the time domain. In [3], it is demonstrated that segmentation at syllable-like units followed by isolated style re

Data Loading...

Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

Recommend Documents

Analyzing Emotion in Spontaneous Speech

Genuine Spontaneous vs Fake Spontaneous Speech: In Search of Distinction

Units of Group Algebras of the Fours Group

Segmentation and Reassembly of Protocol Data Units

Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing

Filled Pause Detection in Indonesian Spontaneous Speech

Spontaneous Imbibition of Liquids into Nanopores

Automatic Prediction of Word Form Reduction in Russian Spontaneous Speech

Quotients of Passman Fours Group and Non-units of Their Group Algebras

A Continuous Word Segmentation of Bengali Noisy Speech

UWB Fractal Antennas with Low Group Delay Variation

LSTM-Based Speech Segmentation Trained on Different Foreign Languages