Multi-channel spectrograms for speech processing applications using deep learning methods

PDF / 1,224,141 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
72 Downloads / 188 Views

ORIGINAL ARTICLE

Multi‑channel spectrograms for speech processing applications using deep learning methods T. Arias‑Vergara1,2,3 · P. Klumpp2 · J. C. Vasquez‑Correa1,2 · E. Nöth2 · J. R. Orozco‑Arroyave1,2 · M. Schuster3 Received: 7 February 2020 / Accepted: 14 September 2020 © The Author(s) 2020

Abstract Time–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process this information, deep learning models with convolution layers can be used to obtain feature maps. In many speech processing applications, the time–frequency representations are obtained by applying the short-time Fourier transform and using single-channel input tensors to feed the models. However, this may limit the potential of convolutional networks to learn different representations of the audio signal. In this paper, we propose a methodology to combine three different time–frequency representations of the signals by computing continuous wavelet transform, Melspectrograms, and Gammatone spectrograms and combining then into 3D-channel spectrograms to analyze speech in two different applications: (1) automatic detection of speech deficits in cochlear implant users and (2) phoneme class recognition to extract phone-attribute features. For this, two different deep learning-based models are considered: convolutional neural networks and recurrent neural networks with convolution layers. Keywords Speech processing · Multi-channel spectrograms · Cochlear implants · Phoneme recognition

1 Introduction In speech and audio processing applications, the data are commonly processed by computing compressed representations that may not capture the dynamic information of the signals. In the recent years, there has been an increasing number of works considering deep learning methods for speech and audio analysis such as convolutional neural networks (CNNs) and recurrent neural networks (RNN), among others [1]. Particularly for CNNs, audio data are processed by feeding the convolution layers with time–frequency representations (spectrograms) of the signals providing Authors must disclose all relationships or interests that could have direct or potential influence or impart bias on the work. * T. Arias‑Vergara [email protected] 1

Faculty of Engineering, Universidad de Antioquia UdeA, Calle 70 No. 52‑21, Medellín, Colombia

2

Pattern Recognition Lab, Friedrich-Alexander University, Erlangen‑Nürnberg, Germany

3

Department of Otorhinolaryngology, Head and Neck Surgery, Ludwig-Maximilians University, Munich, Germany

information about how the energy distributed in the frequency domain changes with time. After the convolution operation, the resulting feature maps contain low- and high-level features representing the acoustic information of the signals. Many works have shown the advantages of using CNNs and spectrograms in different speech processing applications such as automatic detection of disordered speech [2–4], acoustic models for automatic speech reco

Data Loading...

Multi-channel spectrograms for speech processing applications using deep learning methods

Recommend Documents

Multispectral Data Processing for Agricultural Applications Using Deep Learning Classification Methods

Parts-of-Speech tagging for Malayalam using deep learning techniques

Deep Learning for NLP and Speech Recognition

Dual Learning for Speech Processing and Beyond

Deep Learning Methods and Applications for Precision Agriculture

Speech and Facial Based Emotion Recognition Using Deep Learning Approaches

Profiling JVM for AI Applications Using Deep Learning Libraries

Multichannel Direction-Independent Speech Enhancement Using Spectral Amplitude Estimation

Neural Modeling of Speech Processing and Speech Learning An Introduc

Deep Learning Applications

Automated bank cheque verification using image processing and deep learning methods

Deep Learning-Based Methods for Plant Disease