Study and Resolution of Arabic Lexical Ambiguity Through Transduction on Text Automaton

Lexical analysis can be a way to remove ambiguities in the Arabic language. So, their resolution is an important task in several domains of Natural Language Processing (NLP). In this context, this paper is inscribed. Our proposed resolution method is base

PDF / 4,668,793 Bytes
11 Pages / 439.37 x 666.142 pts Page_size
22 Downloads / 245 Views

DOWNLOAD

REPORT

Higher Institute of Computer and Communication Technologies of Hammam Sousse, Miracl Laboratory, Sousse, Tunisia [email protected] 2 Faculty of Sciences of Sfax, Miracl Laboratory, University of Sfax, Sfax, Tunisia [email protected]

Abstract. Lexical analysis can be a way to remove ambiguities in the Arabic language. So, their resolution is an important task in several domains of Natural Language Processing (NLP). In this context, this paper is inscribed. Our proposed resolution method is based essentially on the use of transducers on text automata. Indeed, these transducers specify the lexical rules of the Arabic language allowing corpus disambiguation. In order to achieve our resolution method, different types of lexical ambiguities are identiﬁed and studied. Then, an appropriate set of rules is proposed. After that, we represent all speciﬁed rules in NooJ. In addition, we present experimentation with NooJ platform conducted through various linguistic resources to obtain disambiguated syntactic structures suitable for the analysis. The results obtained are ambitious and can be improved by adding other rules and heuristics. Keywords: Lexical ambiguity rule NooJ transducer

Text annotation structure

Arabic lexical

1 Introduction The need for disambiguation appears in several steps of analysis and applications such as syntactic analysis, recognition of named entities and morphological analysis. The disambiguation can be performed on different levels: morphological, syntactic and lexical levels. Indeed, disambiguating an Arabic corpus can widely facilitate several parsing processes which reduce largely the parsing time for researchers. For a successful resolution, we need a rigorous study of the Arabic language to facilitate the identiﬁcation of rules which can be formalized through different frameworks. There are many theoretical platforms allowing formalization, such as grammars and ﬁnite state machines. In fact, ﬁnite automata and particularly transducers are increasingly used in NLP. Thanks to transducers, several local linguistic phenomena (e.g., recognition of named entities, morphological analysis) are treated appropriately. Transduction on text automata is so useful; it can remove paths representing morpho-syntactic ambiguities. Also, to formalize lexical rules, we need to ﬁnd adequate criteria to classify lexical © Springer International Publishing Switzerland 2016 T. Okrut et al. (Eds.): NooJ 2015, CCIS 607, pp. 123–133, 2016. DOI: 10.1007/978-3-319-42471-2_11

124

N. Ghezaiel and K. Haddar

rules in a speciﬁc order of application of rules and to deﬁne sufﬁcient granularity levels of lexical categories allowing the identiﬁcation of efﬁcient rules. By these classiﬁcations, we aim to guarantee the optimization between rules, and to identify the disambiguation methods that can be exploitable by other steps of analysis. In this context, our objectives are to study Arabic lexical ambiguities and to implement a lexical disambiguation tool for the Arabic language with NooJ platform throug

Data Loading...

Study and Resolution of Arabic Lexical Ambiguity Through Transduction on Text Automaton

Recommend Documents

COMPASS Three Carrier Ambiguity Resolution

Carrier Phase Integer Ambiguity Resolution

Automatic Arabic Text Summarization Using Analogical Proportions

Functionality-Improved Arabic Text Steganography Based on Unicode Features

A New RTK Ambiguity Resolution Method

Arabic text summarization using deep learning approach

Iterative Code-Aided ML Phase Estimation and Phase Ambiguity Resolution

Topics Classification of Arabic Text in Quran by Using Matlab

Benefits of BDS-3 B1C/B1I/B2a triple-frequency signals on precise positioning and ambiguity resolution

Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application

Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches bui

Ephemeris monitor with ambiguity resolution for CAT II/III GBAS