Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

PDF / 1,086,089 Bytes
16 Pages / 600 x 792 pts Page_size
34 Downloads / 348 Views

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach Christian Feldbauer Signal Processing and Speech Communication Laboratory, Graz University of Technology, 8010 Graz, Austria Email: [email protected]

Gernot Kubin Signal Processing and Speech Communication Laboratory, Graz University of Technology, 8010 Graz, Austria Email: [email protected]

W. Bastiaan Kleijn Department for Signals, Sensors and Systems, KTH (Royal Institute of Technology), 10044 Stockholm, Sweden Email: [email protected] Received 14 November 2003; Revised 25 August 2004 Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding. Keywords and phrases: speech and audio coding, auditory representation, auditory model inversion, auditory synthesis, perceptual domain coding, multiple description coding.

1.

INTRODUCTION

1.1. Motivation The encoding of an analog signal at a finite rate requires quantization and introduces distortion. Models of the human auditory system can be exploited to minimize, for a given rate (specified either as an average or as a fixed rate), the audible distortion (as quantified by the model) introduced by the encoding [1, 2, 3]. Signal features will then be specified with a precision that reflects audible distortion. However, the introduction of knowledge of the auditory system into coding has been handicapped by delay and computational constraints. For instance, temporal masking and This is an open-access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

the adaptation of the hearing system to a stimulus are highly nonlinear eﬀects [4, 5]. A time-localized quantization error in the perceived signal can result in a significant change in the auditory nerve firings over a response time interval that can last on the order of hundreds of milliseconds. Therefore, the eﬀect of time-lo

Data Loading...

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

Recommend Documents

Hammerstein Model for Speech Coding

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding

Predictive Lossless Audio Coding

Audio Coding Theory and Applications

Scalable Lossless Audio Coding

Audio-Visual Speech Processing

Parametric Coding of Stereo Audio

Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

Filter Banks and Audio Coding Compressing Audio Signals Using Python

Coding Video Data, Audio Data, and Images

A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration