A Platform Based ANLP Tools for the Construction of an Arabic Historical Dictionary

In this paper, we provide for the linguists a method to facilitate the creation of a standard Arabic historical dictionary in order to save the lost period and to be up to date with other languages. In this method, we propose a platform of Automatic Natur

  • PDF / 765,258 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 94 Downloads / 164 Views

DOWNLOAD

REPORT


Abstract. In this paper, we provide for the linguists a method to facilitate the creation of a standard Arabic historical dictionary in order to save the lost period and to be up to date with other languages. In this method, we propose a platform of Automatic Natural Language Processing (ANLP) tools which permits the automatic indexing and research from a corpus of Arabic texts. The indexation is applied after some pretreatments: segmentation, normalization, and filtering, morphological analysis. The prototype that we’ve developed for the generation of standard Arabic historical dictionary permits to extract contexts from the entered corpus and to assign meaning from the user. The evaluation of our system shows that the results are reliable. Keywords: Historical dictionary  Platform  ANLP Indexation  Morphological analyzer  Stemming



Arabic



Corpus



1 Introduction The need to a better comprehension of the actual language, its culture recognition in its period and a concentration to the linguistics toward the fast evolution of the language nowadays oblige linguists to develop huge references in the different languages supporting the big volume of vocabularies and presenting its evolution by the time, it’s the historical dictionary. Fortunately, there are many linguists who recognized this problem since a long time and tried to collect the history of their languages by developing historical dictionaries such the French historical dictionary and the English one. But, unfortunately, because of its complexity and semi algorithmic nature of its morphology employing numerous rules and constraints on inflexion, derivation and cliticization, the Arabic language is not yet stored although its importance and richness. That’s why, we will propose in this paper a solution that can help linguists to save the time already passed and build their own database presenting the Arabic vocabularies from its birth and describing its evolution historically and geographically.

© Springer International Publishing Switzerland 2016 E. Métais et al. (Eds.): NLDB 2016, LNCS 9612, pp. 239–248, 2016. DOI: 10.1007/978-3-319-41754-7_21

240

F. Khalfallah et al.

2 State of the Art A dictionary is a reference database containing a set of words of a language or a domain of activity, generally represented under the alphabetic order. In general, a dictionary indicates the root, the definition, the spelling, the sense and the syntax of the entry [1, 2]. The general structure of a dictionary can have the form of: {Key = Description}, where the keys are generally words from the language and the descriptions are sets of words representing the definitions, the explanations or the correspondences synonym, antonym, translation, etymology) [1, 3]. There are two main types of dictionaries: the classical and the electronic, their content is necessarily similar but the main difference between them is not only in the presented information but also in its use, the content display, and the research capacity. In fact, Arabic dictionaries started from an ea