Wavelet sub-band features for voice disorder detection and classification

  • PDF / 1,974,035 Bytes
  • 25 Pages / 439.642 x 666.49 pts Page_size
  • 58 Downloads / 193 Views

DOWNLOAD

REPORT


Wavelet sub-band features for voice disorder detection and classification Girish Gidaye1,2,3 · Jagannath Nirmal3 · Kadria Ezzine4 · Mondher Frikha4 Received: 6 December 2019 / Revised: 29 June 2020 / Accepted: 21 July 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Acoustic analysis of the speech signal enables non-intrusive, affordable, unbiased and fast assessment of voice pathologies. This assessment provides complimentary information to otolaryngologist for preliminary diagnosis of pathological larynx. Several voice impairment assessment systems focused on acoustic analysis have been introduced in recent years. Nevertheless, these systems are tested using only one or two datasets and are not independent of database and human bias. In this paper, a unified wavelet based framework is suggested for evaluating voice disorders, which is independent of database and human bias. Stationary wavelet transform (SWT) is used to decompose the speech signal, since it offers good time and frequency localization. Energy and statistical features are extracted from each subband after multilevel decomposition. Higher the decomposition level, higher is the order of feature vector. To decrease the dimension of the feature vector, information gain (IG) based feature selection technique is harnessed for selecting most relevant and discarding redundant features. The enriched feature vector is assessed using support vector machine (SVM), stochastic gradient descent (SGD) and artificial neural network (ANN) classifiers. Records of vowel /a/, vocalized at natural pitch for both healthy and pathological subjects, are mined from German, English, Arabic and Spanish speech databases. During the first phase of experiments, input speech signal is detected as healthy or pathological. Second phase classifies input speech samples into healthy, cyst, paralysis or polyp. Experimental results demonstrate that, the extracted energy and statistical features can be used as possible clues for voice disorder evaluation. The most important aspect of the proposed method is that the features are independent of the fundamental frequency. The detection and classification rates attained are comparable to other state-of-the-art approaches. Keywords Voice disorder detection · Stationary wavelet transform · Voice pathology · Statistical features · Feature selection · Information gain

 Girish Gidaye

[email protected]

Extended author information available on the last page of the article.

Multimedia Tools and Applications

1 Introduction Speech is the human being’s natural way of communication in everyday life. In order to produce intelligible speech, primary phonetic system organs, including larynx, play an important role. Distortion of the normal speech flow due to any pathological condition is known as voice disorder [43]. Therefore, any present disorder impairs the functioning of the speech production system and thus produces a distorted sound. Voice disorders can be categorized as: (1) Organic (2) Functional and