Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

PDF / 765,095 Bytes
7 Pages / 600 x 792 pts Page_size
100 Downloads / 332 Views

Audio Classification in Speech and Music: A Comparison Between a Statistical and a Neural Approach Alessandro Bugatti Department of Electronics for Automation, University of Brescia, Via Branze 38, 25123 Brescia, Italy Email: [email protected]

Alessandra Flammini Department of Electronics for Automation, University of Brescia, Via Branze 38, 25123 Brescia, Italy Email: [email protected]

Pierangelo Migliorati Department of Electronics for Automation, University of Brescia, Via Branze 38, 25123 Brescia, Italy Email: [email protected] Received 27 July 2001 and in revised form 8 January 2002 We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two diﬀerent techniques for speech/music discrimination. The first method is based on zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used. Keywords and phrases: speech/music discrimination, indexing of audio-visual documents, neural networks, multimedia applications.

1. INTRODUCTION Eﬀective navigation through multimedia documents is necessary to enable widespread use and access to richer and novel information sources. Design of eﬃcient indexing techniques to retrieve relevant information is another important requirement. Allowing for possible automatic procedures to semantically index audio-video material represents therefore a very important challenge. Such methods should be designed to create indices of the audio-visual material, which characterize the temporal structure of a multimedia document from a semantic point of view. The International Standard Organization (ISO) started in October 1996 a standardization process for the description of the content of multimedia documents, namely MPEG-7: the “Multimedia Content Description Interface” [1, 2]. However, the standard specifications do not indicate methods for the automatic selection of indices. A possible mean is to identify series of consecutive seg-

ments, which exhibit a certain coherence, according to some property of the audio-visual material. By organizing the degree of coherence, according to more abstract criteria, it is possible to construct a hierarchical representation of information, so as to create a Table of Content description of the document. Such descr

Data Loading...

Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

Recommend Documents

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

A Statistical Approach to Automatic Speech Summarization

Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Perceptual Models for Speech, Audio, and Music Processing

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Speech, Audio, Image and Biomedical Signal Processing using Neural Networks

A Statistical Approach for Comparison of Secondary Precipitation Products

A Comparison of Speech-to-Speech Neural Network Methodologies for Digit Pronunciation

Fruit Classification Through Deep Learning: A Convolutional Neural Network Approach

MCRN: A New Content-Based Music Classification and Recommendation Network

Process Optimization A Statistical Approach

Time Series Classification in Reservoir- and Model-Space: A Comparison