Study and Resolution of Arabic Lexical Ambiguity Through Transduction on Text Automaton

Lexical analysis can be a way to remove ambiguities in the Arabic language. So, their resolution is an important task in several domains of Natural Language Processing (NLP). In this context, this paper is inscribed. Our proposed resolution method is base

  • PDF / 4,668,793 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 22 Downloads / 185 Views

DOWNLOAD

REPORT


Higher Institute of Computer and Communication Technologies of Hammam Sousse, Miracl Laboratory, Sousse, Tunisia [email protected] 2 Faculty of Sciences of Sfax, Miracl Laboratory, University of Sfax, Sfax, Tunisia [email protected]

Abstract. Lexical analysis can be a way to remove ambiguities in the Arabic language. So, their resolution is an important task in several domains of Natural Language Processing (NLP). In this context, this paper is inscribed. Our proposed resolution method is based essentially on the use of transducers on text automata. Indeed, these transducers specify the lexical rules of the Arabic language allowing corpus disambiguation. In order to achieve our resolution method, different types of lexical ambiguities are identified and studied. Then, an appropriate set of rules is proposed. After that, we represent all specified rules in NooJ. In addition, we present experimentation with NooJ platform conducted through various linguistic resources to obtain disambiguated syntactic structures suitable for the analysis. The results obtained are ambitious and can be improved by adding other rules and heuristics. Keywords: Lexical ambiguity rule  NooJ transducer



Text annotation structure



Arabic lexical

1 Introduction The need for disambiguation appears in several steps of analysis and applications such as syntactic analysis, recognition of named entities and morphological analysis. The disambiguation can be performed on different levels: morphological, syntactic and lexical levels. Indeed, disambiguating an Arabic corpus can widely facilitate several parsing processes which reduce largely the parsing time for researchers. For a successful resolution, we need a rigorous study of the Arabic language to facilitate the identification of rules which can be formalized through different frameworks. There are many theoretical platforms allowing formalization, such as grammars and finite state machines. In fact, finite automata and particularly transducers are increasingly used in NLP. Thanks to transducers, several local linguistic phenomena (e.g., recognition of named entities, morphological analysis) are treated appropriately. Transduction on text automata is so useful; it can remove paths representing morpho-syntactic ambiguities. Also, to formalize lexical rules, we need to find adequate criteria to classify lexical © Springer International Publishing Switzerland 2016 T. Okrut et al. (Eds.): NooJ 2015, CCIS 607, pp. 123–133, 2016. DOI: 10.1007/978-3-319-42471-2_11

124

N. Ghezaiel and K. Haddar

rules in a specific order of application of rules and to define sufficient granularity levels of lexical categories allowing the identification of efficient rules. By these classifications, we aim to guarantee the optimization between rules, and to identify the disambiguation methods that can be exploitable by other steps of analysis. In this context, our objectives are to study Arabic lexical ambiguities and to implement a lexical disambiguation tool for the Arabic language with NooJ platform throug