Towards robust voice pathology detection

  • PDF / 728,088 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 20 Downloads / 199 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789().,-volV)

S.I.: ADVANCES IN BIO-INSPIRED INTELLIGENT SYSTEMS

Towards robust voice pathology detection Investigation of supervised deep learning, gradient boosting, and anomaly detection approaches across four databases Pavol Harar1



Zoltan Galaz1 • Jesus B. Alonso-Hernandez2 • Jiri Mekyska1 • Radim Burget1 • Zdenek Smekal1

Received: 10 January 2018 / Accepted: 24 March 2018 Ó The Natural Computing Applications Forum 2018

Abstract Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking, and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system, we investigated three distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC), and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of four different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient-boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies. Keywords Voice pathology detection  Deep learning  Gradient boosting  Anomaly detection

1 Introduction Voice pathology can be caused by the presence of tissue infection, systemic changes, mechanical stress, surface irritation, tissue changes, neurological and muscular changes, and other factors [59]. Due to vocal pathology, the mobility, functionality, and shape of the vocal folds are affected resulting into irregular vibrations and increased acoustic noise. Such a voice sounds strained, harsh, weak, & Pavol Harar [email protected] 1

Brno University of Technology, Technicka 3082/12, 61 600 Brno, Czech Republic

2

Institute for Technological Development and Innovation in Communications (IDeTIC), University of Las Palmas de Gran Canaria, Parque Cientı´fico Tecnolo´gico de la ULPGC, Polivalente II, Planta 2, 35017 Las Palmas de Gran Canaria, Spain

and breathy [27, 58], which significantly contributes to the overall poor voice quality [10, 38]. Up to this day, vocal pathology detection has been approached by subjective and objective evaluations [37]. The first category (subjective evaluation) consists of socalled in-hospital auditory-perceptual and visual examination of t