An Automatic Phonetic Aligner for Brazilian Portuguese with a Praat Interface

The analysis of the phonetic entities of speech nearly always requires the alignment of an audio file with its phonetic transcription. However, it is an extremely labor-intensive task. An automatic alignment tool has modules that depend on the language an

PDF / 1,162,647 Bytes
11 Pages / 439.37 x 666.142 pts Page_size
37 Downloads / 254 Views

DOWNLOAD

REPORT

Abstract. The analysis of the phonetic entities of speech nearly always requires the alignment of an audio ﬁle with its phonetic transcription. However, it is an extremely labor-intensive task. An automatic alignment tool has modules that depend on the language and, while there are many public resources for some languages (e.g., English and French), the resources for Brazilian Portuguese (BP) are still limited. This work describes the development of an automatic phonetic alignment tool for BP, consisting of grapheme-to-phone converter, syllabiﬁcation system and HTK-based acoustic models. This aligner is implemented and freely distributed as a plug-in of Praat. Performance tests are presented, comparing the current proposal with an existing tool. Keywords: Phonetic alignment · Brazilian Portuguese · Pronunciation dictionary · Syllabiﬁcation · HTK · Praat

1

Introduction

Automatic speech recognition (ASR) and speech synthesis (TTS) are data-driven technologies that require a relatively large amount of labeled data. As consequence, many large speech corpora have been collected for speech technology development in the recent years. And they need to be phonetically segmented with a high level of precision, i.e. the phones must be time-aligned with the sound, on risk of impairing the quality of the synthesized voice, for example. Indeed, the analysis of the prosodic structure of speech requires to know the precise position of the phonetic temporal boundaries [1]. However, manual phonetic segmentation is time-consuming, more than 13 h for a one-minute recording [2], and expensive, since it requires trained language experts. The most widely explored phonetic alignment techniques are based either on hidden Markov models (HMM) used in forced-alignment mode or on dynamic time alignment with synthesized speech (TTS+DTW) [3]. In [4], a comparison between these two approaches has showed that in general the TTS+DTW segmentation is more accurate than HMM, however, the HMM-based phonetic aligners are more reliable. Hence, an hybrid system is proposed in [5]. The results with a Portuguese voice data suggest that the use of HMM-based along with c Springer International Publishing Switzerland 2016 J. Silva et al. (Eds.): PROPOR 2016, LNAI 9727, pp. 374–384, 2016. DOI: 10.1007/978-3-319-41552-9 38

Automatic Phonetic Alignment in Brazilian Portuguese

375

TTS+DTW alignment tools can be worthy, as the former is more robust and the later is more accurate. In this context, automatic alignment tools such as EasyAlign [6], SPPAS [7], P2FA [8] and Train&Align [9] have been developed and released. Besides a phonetic dictionary, all these tools rely on the acoustic modeling of the language with HMM. They provide the user with pre-existent speaker-independent models of each language, or models of each phoneme (monophone models) or group of phonemes (triphone models) are directly trained on the corpus to align. Then, these models are used to align an audio ﬁle with its phonetic transcription. P2FA is an open-source automatic phonetic alignme

Data Loading...

An Automatic Phonetic Aligner for Brazilian Portuguese with a Praat Interface

Recommend Documents

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools

BERTimbau: Pretrained BERT Models for Brazilian Portuguese

An objective system for appraising clear aligner treatment difficulty: clear aligner treatment complexity assessment too

Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach

Fracture Initiation and Propagation in a Brazilian Disc with a Plane Interface: a Numerical Study

Phonetic Awareness, Phonetic Sensitivity and the Second Language Learner

Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes

Phonetic Search

An Overview of Phonetic Encoding Algorithms

Constructing a common language-in-education policy? Portuguese, Brazilian and Timorese collaboration in the reintroducti

Production of Cantonese Lexical Tones by Native Speakers of Brazilian Portuguese: A Comparative Analysis

Long Dispositional Flow Scale (DFS-2) General: Adaptation to and Validation for Brazilian Portuguese