Radial Basis Function Networks for Conversion of Sound Spectra

  • PDF / 822,472 Bytes
  • 9 Pages / 600 x 792 pts Page_size
  • 45 Downloads / 222 Views

DOWNLOAD

REPORT


adial Basis Function Networks for Conversion of Sound Spectra Carlo Drioli Università di Padova, Dipartimento di Elettronica e Informatica (DEI), Via Gradenigo 6/a, I-35131 Padova, Italy Email: [email protected] Received 5 April 2000 and in revised form 18 January 2001 In many advanced signal processing tasks, such as pitch shifting, voice conversion or sound synthesis, accurate spectral processing is required. Here, the use of Radial Basis Function Networks (RBFN) is proposed for the modeling of the spectral changes (or conversions) related to the control of important sound parameters, such as pitch or intensity. The identification of such conversion functions is based on a procedure which learns the shape of the conversion from few couples of target spectra from a data set. The generalization properties of RBFNs provides for interpolation with respect to the pitch range. In the construction of the training set, mel-cepstral encoding of the spectrum is used to catch the perceptually most relevant spectral changes. Moreover, a singular value decomposition (SVD) approach is used to reduce the dimension of conversion functions. The RBFN conversion functions introduced are characterized by a perceptually-based fast training procedure, desirable interpolation properties and computational efficiency. Keywords and phrases: sound transformations, sinusoidal representation, RBFNs, spectral processing.

1. INTRODUCTION In the field of speech and audio processing a large number of applications have been proposed up to the present which realizes high-level transformations by combination of simpler effects like time-scale modification, pitch shifting, amplitude envelope modification, and spectral processing. Most of these applications are based on a sinusoidal representation of the signal. In this work we focus our attention on the spectral processing item and we stress the importance of an accurate representation of the spectrum and its characterization when modeling real data in both audio processing and synthesis applications. Among the most important and recent applications in which spectral processing is implied, time-scale and pitch modification have been widely explored, especially in the speech processing field, and the problem of correctly reproducing the spectral characteristics has been stressed [1]. Recently, a new spectral processing approach has been proposed by Stylianou et al. [2], where a conversion function was build from training examples and was used to convert the spectral features of a first speaker in the spectral features of a second speaker, who uttered the same sentence. Besides the field of speech processing, the sinusoidal modeling of sound mainly interested the computer music field. Analysis-based additive sound synthesis is effective due to the high quality of tones generated, and to the high degree of control. In the work by Horner and Beauchamp [3], additive synthesis based

on the Short-Time Fourier Transform (STFT) analysis is used as the engine for sound generation purposes, and a dynamic filter is used