Sparse Spectrotemporal Coding of Sounds

  • PDF / 2,189,755 Bytes
  • 9 Pages / 600 x 792 pts Page_size
  • 68 Downloads / 210 Views

DOWNLOAD

REPORT


Sparse Spectrotemporal Coding of Sounds David J. Klein Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland Email: [email protected]

¨ Peter Konig Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland Email: [email protected]

¨ Konrad P. Kording Institute of Neurology, University College London, Queen square, London, WC1N 3BG, UK Email: [email protected] Received 1 May 2002 and in revised form 28 January 2003 Recent studies of biological auditory processing have revealed that sophisticated spectrotemporal analyses are performed by central auditory systems of various animals. The analysis is typically well matched with the statistics of relevant natural sounds, suggesting that it produces an optimal representation of the animal’s acoustic biotope. We address this topic using simulated neurons that learn an optimal representation of a speech corpus. As input, the neurons receive a spectrographic representation of sound produced by a peripheral auditory model. The output representation is deemed optimal when the responses of the neurons are maximally sparse. Following optimization, the simulated neurons are similar to real neurons in many respects. Most notably, a given neuron only analyzes the input over a localized region of time and frequency. In addition, multiple subregions either excite or inhibit the neuron, together producing selectivity to spectral and temporal modulation patterns. This suggests that the brain’s solution is particularly well suited for coding natural sound; therefore, it may prove useful in the design of new computational methods for processing speech. Keywords and phrases: sparse coding, natural sounds, spectrotemporal receptive fields, spectral representation of speech.

1.

INTRODUCTION

The brain evolves both overindividual and overevolutionary timescales embedded into the properties of the real world. It thus seems that the properties of any sensory system should be matched with the statistics of the natural stimuli it is typically operating on [1]. This would suggest that the functionality of sensory neurons can be understood in terms of coding optimally for natural stimuli. This line of inquiry has been fruitful in the visual modality. Many properties of the mammalian visual system can be explained as leading to optimally sparse neural responses in response to pictures of natural scenes. Within this paradigm, it is possible to reproduce the properties of neurons in the lateral geniculate nucleus (LGN) [2] and of simple cells in the primary visual cortex [3, 4, 5]. The term “sparse representation” is often used in these studies to address one of two distinct albeit related meanings: (1) neurons of the population should have significantly distinct functionality in order to avoid redundancy, and (2) the neurons

should exhibit sparse activity over time such that their activity level is often close to zero, but is occasionally very high. A lar