A NooJ Tunisian Dialect Translator
The elaboration of a translator system from Arabic dialect to modern standard Arabic becomes an important task in Natural Language Processing applications in the last years. In this context, we are interested in building a translator from Tunisian dialect
- PDF / 2,083,613 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 95 Downloads / 227 Views
Faculty of Economic Sciences and Management of Sfax, Miracl Laboratory, University of Sfax, Sfax, Tunisia [email protected] 2 Miracl Laboratory, Institute of Computer Sciences and Communications of Hammam Sousse, Sousse, Tunisia [email protected] 3 Faculty of Sciences of Sfax, Miracl Laboratory, University of Sfax, Sfax, Tunisia [email protected]
Abstract. The elaboration of a translator system from Arabic dialect to modern standard Arabic becomes an important task in Natural Language Processing applications in the last years. In this context, we are interested in building a translator from Tunisian dialect to modern standard Arabic. In fact, Tunisian dialect is a variant of Arabic as much as it differs from modern standard Arabic. Besides, it is difficult to understand for non-Tunisian people. Intending to elaborate our translator, we study many Tunisian dialect corpora to identify and investigate different phenomena such as Tunisian dialect word morphology and also Tunisian Dialect sentences. The proposed translation method is based on a bilingual dictionary extracted from the study corpus and an elaborated set of local grammars. In addition, local grammars are transformed into finite state transducers while using new technologies of NooJ linguistic platform. To test and evaluate the designed translator, we apply it on a Tunisian dialect test corpus containing more than 18,000 words. The obtained results are ambitious. Keywords: Word-to-word translation transducer Tunisian dialect MSA
Bilingual dictionary Finite
1 Introduction The Tunisian Dialect translator is a beneficial task in the domain of Natural Language Processing (NLP). Indeed, it facilitates the diffusion of Tunisian Dialect (TD) to the Arab world. Moreover, thanks to Modern Standard Arabic (MSA) translators to other languages, translating TD to another language will be easy. Besides, our translation system helps in several fields such as the subtitling of Tunisian Dialect artistic works (films, series and novels) and the communication with dialogue systems (Automated Teller Machine ATM). Unfortunately, TD is not taught in Tunisian schools. This fact causes the absence of standard spelling. Furthermore, the origin of TD’s words is a mixture of several languages such as Arabic, French, Ottoman, Italian, Amazigh and Maltese. Besides, popular vocabulary is constantly evolving because of rap songs, the increasing use of © Springer Nature Switzerland AG 2020 H. Fehri et al. (Eds.): NooJ 2019, CCIS 1153, pp. 123–134, 2020. https://doi.org/10.1007/978-3-030-38833-1_11
124
R. Torjmen et al.
social networks and new technologies. During the construction of our translation system, we have encountered many problems. Among them, the target language, compared to TD, has different word inflections as well as the change of the word order. In this context, our principal objective is to build a translator from TD to MSA. To achieve our goal, we need to carry out several steps. The first one is to provide a deep linguistic study for TD sentences
Data Loading...