Deep convolutional network for urbansound classification

  • PDF / 863,444 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 25 Downloads / 268 Views

DOWNLOAD

REPORT


Sådhanå (2020)45:210 https://doi.org/10.1007/s12046-020-01442-x

Sadhana(0123456789().,-volV)FT3](012345 6789().,-volV)

Deep convolutional network for urbansound classification N KARTHIKA*

and B JANET

Department of Computer Applications, National Institute of Technology, Tiruchirappalli 620015, India e-mail: [email protected]; [email protected] MS received 3 August 2019; revised 21 May 2020; accepted 23 June 2020 Abstract. The efficiency of Convolutional Neural Networks in classifying terse audio snippets of UrbanSounds is evaluated. A deep neural model contains two convolutional layers coupled with Maxpooling plus three fully interconnected (dense) layers. The deep neural model is being trained upon low level description of various urban sound clips with deltas. The efficiency of the neural network is examined on urban recordings and compared with different contemporary approaches. The model obtained 76% validation accuracy that is better than other conventional models which relied only on Mel Frequency Cepstral Coefficients. Keywords. Convolutional neural networks; MaxPooling; rectified linear units (ReLU); UrbanSounds; multiclass classification.

1. Introduction Detection of everyday ordinary sounds has countless applications such as audio surveillance [1], security monitoring in room [2] and public transport [3], autonomous vehicle [4], detection of intruders in wildlife [5], medical telemonitoring [6] i.e., examining elderly people and observing noise pollution in cities [7, 8]. There are masses of research about sound classification in many fields like bio-acoustics, speech, song and music, but an attempt to investigate the urban sound source environment is exceptionally uncommon [9], because the urban sound sources have a variety of heterogeneous aurals that occur from city acoustic atmosphere, the patterns too differ extensively [10]. The existing approaches usually focus on categorizing the auditory scene type like a park, street [11–14] in opposition to the recognition of particular acoustic sources in those scenery like an engine idling, car-horn or a bird tweet. The latter requires greater effort due to the existence of multiple sources with a variety of mechanisms to produce sound. Further, these can be concealed by noise or some are fairly noise like sounds themselves, for example engine sounds and air conditioners. Most of the past work on urban sound source classification is built on classical [15], hand-crafted features [16, 17] which proved to be over-sensitive to urban background noise environments [18]. Deep convolutional neural networks are absolutely appropriate to classify the urban sound [19]. They are efficient to capture the energy modulation templates over time and frequency when supplied into spectrogram related inputs. It is an important quality to *For correspondence

differentiate between different sounds such as gunshot and siren [15]. The Convolutional Neural Network (CNN) is capable to learn fruitfully through making use of convolutional filters (kernels) with a modest rec