Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
- PDF / 2,170,114 Bytes
- 25 Pages / 439.37 x 666.142 pts Page_size
- 71 Downloads / 233 Views
Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech G. Diwakar1 · Veena Karjigi1 Received: 11 May 2019 / Revised: 3 April 2020 / Accepted: 6 April 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Alignment of transcription to the speech finds applications in video subtitling, human– computer interaction by means of natural language communication, etc. In spite of many advancements, alignment of transcription to speech remains a challenging task and may become even more challenging for dysarthric speech. Dysarthria is a motor speech disorder resulting from damaged peripheral or central nervous system and causes slow speaking rate, pronunciation deviations, and prolonged pause interval between words and syllables. One of the problems in aligning dysarthric speech to text is the presence of repetition. Repetition can be at syllable/word/phrase level. In this work, we proposed an algorithm for syllable boundary detection followed by syllable repetition detection in dysarthric speech. When a syllable is found to be repeated, that syllable is repeated automatically in the transcription also. Modified transcription is given to the aligner along with the dysarthric speech. The proposed system when tested for word alignment with 15 utterances containing 146 words resulted in root mean square error (RMSE) of 0.138 when compared with the existing work in the literature, which gives an RMSE of 0.276. Keywords Alignment · Dysarthria · Repetition · Transcription
1 Introduction Human beings are social animals, and this very nature necessitates them to express their ideas, emotions, and feelings with one another. Several modes of communication are possible, namely through the use of sign and natural languages. Among these two,
B
G. Diwakar [email protected] Veena Karjigi [email protected]
1
Department of Electronics and Communication, Siddaganga Institute of Technology - Tumakuru, Tumakuru, Karnataka, India
Circuits, Systems, and Signal Processing
the use of natural language is the most effective mode of communication. However, people with neurological disorders find it difficult to speak fluently; dysarthria is one such case. Dysarthria is a motor speech disorder resulting from damaged central or peripheral nervous system, which results in paralysis, weakness, lack of coordination in motor speech system, and causes impaired movement of the limbs, which prohibits the use of sign language. The person suffering from dysarthria often has slow speaking rates, prolonged pause interval between words and syllables, pronunciation deviation, and presence of disfluencies in speech signal, which results in an unintelligible speech with varying degrees. Even with such a high level of speech production disabilities exhibited by many dysarthric speakers, speech communication requires less effort and is faster when compared with typing-based methods [8]. To work with dysarthric speech, a limited number of databases are available like Nemours database [13], Un
Data Loading...