Bottom-up broadcast neural network for music genre classification

  • PDF / 1,411,882 Bytes
  • 19 Pages / 439.642 x 666.49 pts Page_size
  • 108 Downloads / 242 Views

DOWNLOAD

REPORT


Bottom-up broadcast neural network for music genre classification Caifeng Liu1

· Lin Feng2 · Guochao Liu3 · Huibing Wang4 · Shenglan Liu2

Received: 23 October 2019 / Revised: 30 June 2020 / Accepted: 18 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/ music-genre-classification. Keywords Music genre classification · CNN · Spectrogram This study was funded by National Natural Science Foundation of People’s Republic of China (No.61672130, No.61602082, No.91648205, No.61627808, No.61972064), the National Key Scientific Instrument and Equipment Development Project (No. 61627808), the LiaoNing Revitalization Talents Program (No. XLYC1806006).  Lin Feng

[email protected] Caifeng Liu [email protected] 1

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China

2

School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China

3

Department of Functional RD, JD, Beijing, China

4

College of Information Science and Technology, Dalian Maritime University, Dalian, China

Multimedia Tools and Applications

1 Introduction With the rapid development of multimedia technology, a tremendous number of digital audio are uploaded on the Internet. Except for the benefits brought by these audio tasks, the explosive growth of these audio causes fatal effects on various aspects. Therefore, managing these audio appropriately is a burdensome task crying out for reliable solutions. Researchers all over the world have devoted plenty of efforts to deal with various audio. Music information retrieval (MIR) is o