Hammerstein Model for Speech Coding

PDF / 696,723 Bytes
12 Pages / 600 x 792 pts Page_size
52 Downloads / 355 Views

Hammerstein Model for Speech Coding Jari Turunen Department of Information Technology, Tampere University of Technology, Pori, Pohjoisranta 11, P.O. Box 300, FIN-28101 Pori, Finland Email: [email protected]

Juha T. Tanttu Department of Information Technology, Tampere University of Technology, Pori, Pohjoisranta 11, P.O. Box 300, FIN-28101 Pori, Finland Email: [email protected]

Pekka Loula Department of Information Technology, Tampere University of Technology, Pori, Pohjoisranta 11, P.O. Box 300, FIN-28101 Pori, Finland Email: [email protected] Received 7 January 2003 and in revised form 19 June 2003 A nonlinear Hammerstein model is proposed for coding speech signals. Using Tsay’s nonlinearity test, we first show that the great majority of speech frames contain nonlinearities (over 80% in our test data) when using 20-millisecond speech frames. Frame length correlates with the level of nonlinearity: the longer the frames the higher the percentage of nonlinear frames. Motivated by this result, we present a nonlinear structure using a frame-by-frame adaptive identification of the Hammerstein model parameters for speech coding. Finally, the proposed structure is compared with the LPC coding scheme for three phonemes /a/, /s/, and /k/ by calculating the Akaike information criterion of the corresponding residual signals. The tests show clearly that the residual of the nonlinear model presented in this paper contains significantly less information compared to that of the LPC scheme. The presented method is a potential tool to shape the residual signal in an encode-eﬃcient form in speech coding. Keywords and phrases: nonlinear, speech coding, Hammerstein model.

1.

INTRODUCTION

Due to the solid theory underlying linear systems, the most widely used methods for speech coding up to the present day have been the linear ones. Numerous modifications of those methods have been proposed. At the same time, however, the application of nonlinear methods to speech coding has gained more and more popularity. An early example of nonlinear speech coding is the a-law/µ-law compression scheme in pulse code modulation (PCM) quantization. With a-law (8 bits per sample) or µ-law (7 bits per sample) compression, the total saving of 4–5 bits per sample can be achieved compared to linear quantization (12 bits per sample). However, these nonlinearities do not involve modeling and are purely based on the fact that the human hearing system has logarithmic characteristics. Probably, the most well-known linear model-based speech coding scheme is the linear predictive coding (LPC), where model parameters together with the information about the residual signal need to be transmitted. For example, in the ITU-T G.723.1 speech encoder, the linear predic-

tive filter coeﬃcients can be represented using only 24 bits while the excitation signal requires either 165 bits (6.3 kbps mode) or 134 bits (5.3 kbps mode). In analysis-by-synthesis coders, such as G.723.1, the excitation signal is used for speech synthesis to excite the linear filter to produc

Data Loading...

Hammerstein Model for Speech Coding

Recommend Documents

An Improved Hammerstein Model for System Identification

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

ANFIS-Hammerstein Model for Nonlinear Systems Identification Using GSA

Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

Modification of Pitch Parameters in Speech Coding for Information Hiding

A Psychoacoustic "NofM"-Type Speech Coding Strategy for Cochlear Implants

A Parametric Tongue Model for Animated Speech

Speech Production Model

Federated Acoustic Model Optimization for Automatic Speech Recognition

Coding the (Simple) Financial Life-Cycle Model

Indirect learning Hammerstein HPA predistorter for wideband GNSS signals

Some Stochastic Gradient Algorithms for Hammerstein Systems with Piecewise Linearity