Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters
- PDF / 775,132 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 8 Downloads / 221 Views
Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters Souli Sameh · Zied Lachiri
Received: 8 May 2012 / Accepted: 11 August 2012 / Published online: 6 September 2012 © Springer Science+Business Media, LLC 2012
Abstract This paper presents an approach aimed at recognizing environmental sounds for surveillance and security applications. We propose a robust environmental sound classification approach, based on spectrograms features derive from logGabor filters. This approach includes three methods. In the first two methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The third method uses the same steps but applied only to three patches extracted from each spectrogram. To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %. Keywords Environmental sounds · Visual features · Log-Gabor filters · Spectrogram · SVM multiclass
S. Sameh () Signal, Image and Pattern Recognition Research Unit, Dept. of Genie Electrique, ENIT, BP 37, 1002, Le Belvédère, Tunisia e-mail: [email protected] S. Sameh · Z. Lachiri Dept. of Physique and Instrumentation, INSAT, BP 676, 1080, Centre Urbain, Tunisia Z. Lachiri e-mail: [email protected]
1 Introduction The Automatic recognition of environmental sound is an important problem in audio domain. Generally, a variety of features have been proposed for audio recognition (Chu et al. 2009; Rabaoui et al. 2008) including different descriptors such as MFCCs, frequency roll-off, spectral centroid, zero-crossing, energy, Linear-Frequencies Cepstral Coefficients (LFCCs). These descriptors can be used as a combination of some, or even all, of these 1-D audio features together, but sometimes the combination between descriptors increases the classification performance compared with the individually-used features. The problem is that there are many features which negatively influenced the quality of classification. Therefore, the recognition rate decreases when the number of targeted classes increases because of the presence of some difficulties like randomness and high variance (Chu et al. 2009). Recently, some efforts have emerged in the new research direction, which demonstrate that the visual techniques can be applied in musical sounds (Yu and Slotine 2008). In order to explore the visual information of environmental sounds, our last work consists in integrating the audio texture concept as image textures (Souli and Lachiri 2011). Our goal has to develop an environmental sounds classification method, using advanced visual descriptors. The feature extraction method uses the structure time-frequency by means of translation-inv
Data Loading...