Neuromimetic Sound Representation for Percept Detection and Manipulation

  • PDF / 3,194,081 Bytes
  • 15 Pages / 600 x 792 pts Page_size
  • 68 Downloads / 145 Views

DOWNLOAD

REPORT


Neuromimetic Sound Representation for Percept Detection and Manipulation Dmitry N. Zotkin Perceptual Interfaces and Reality Laboratory, Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park, MD 20742, USA Email: [email protected]

Taishih Chi Neural Systems Laboratory, The Institute for Systems Research, University of Maryland, College Park, MD 20742, USA Email: [email protected]

Shihab A. Shamma Neural Systems Laboratory, The Institute for Systems Research, University of Maryland, College Park, MD 20742, USA Email: [email protected]

Ramani Duraiswami Perceptual Interfaces and Reality Laboratory, Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park, MD 20742, USA Email: [email protected] Received 2 November 2003; Revised 4 August 2004 The acoustic wave received at the ears is processed by the human auditory system to separate different sounds along the intensity, pitch, and timbre dimensions. Conventional Fourier-based signal processing, while endowed with fast algorithms, is unable to easily represent a signal along these attributes. In this paper, we discuss the creation of maximally separable sounds in auditory user interfaces and use a recently proposed cortical sound representation, which performs a biomimetic decomposition of an acoustic signal, to represent and manipulate sound for this purpose. We briefly overview algorithms for obtaining, manipulating, and inverting a cortical representation of a sound and describe algorithms for manipulating signal pitch and timbre separately. The algorithms are also used to create sound of an instrument between a “guitar” and a “trumpet.” Excellent sound quality can be achieved if processing time is not a concern, and intelligible signals can be reconstructed in reasonable processing time (about ten seconds of computational time for a one-second signal sampled at 8 kHz). Work on bringing the algorithms into the real-time processing domain is ongoing. Keywords and phrases: anthropomorphic algorithms, pitch detection, human sound perception.

1.

INTRODUCTION

When a natural sound source such as a human voice or a musical instrument produces a sound, the resulting acoustic wave is generated by a time-varying excitation pattern of a possibly time-varying acoustical system, and the sound characteristics depend both on the excitation signal and on the production system. The production system (e.g., human vocal tract, the guitar box, or the flute tube) has its own characteristic response. Varying the excitation parameters produces a sound signal that has different frequency components, but still retains perceptual characteristics that uniquely identify the production instrument (identity of the person, type of instrument—piano, violin, etc.), and even the specific type

of piano on which it was produced. When one is asked to characterize this sound source using descriptions based on Fourier analysis, one discovers that concepts such as frequency and amplitude are insufficient to explain such perceptual characteri