Amharic Sentence Parsing Using Base Phrase Chunking

Parsing plays a significant role in many natural language processing (NLP) applications as their efficiency relies on having an effective parser. This paper presents Amharic sentence parser developed using base phrase chunker that groups syntactically cor

  • PDF / 306,132 Bytes
  • 10 Pages / 439.363 x 666.131 pts Page_size
  • 13 Downloads / 337 Views

DOWNLOAD

REPORT


Abstract. Parsing plays a significant role in many natural language processing (NLP) applications as their efficiency relies on having an effective parser. This paper presents Amharic sentence parser developed using base phrase chunker that groups syntactically correlated words at different levels. We use HMM to chunk base phrases where incorrectly chunked phrases are pruned with rules. The task of parsing is then performed by taking chunk results as inputs. Bottom-up approach with transformation algorithm is used to transform the chunker to the parser. Corpus from Amharic news outlets and books was collected for training and testing. The training and testing datasets were prepared using the 10-fold cross validation technique. Test results on the test data showed an average parsing accuracy of 93.75%. Keywords: Amharic Parsing, Base Phrase Chunking, Bottom-up Parsing.

1

Introduction

To process and understand natural languages, the linguistic structures of texts are required to be organized at different levels. A structured text increases the capability of NLP applications [2], [4]. The syntactic level of linguistic analysis concerns how words are put together to form correct sentences and determines what structural role each word plays in the sentence. Broadly speaking, the syntactic level deals with analyzing a sentence that generally consists of segmenting a sentence into words, grouping these words into a certain syntactic structural units, and recognizing syntactic elements and their relationships within a structure. Syntactic level also indicates how the words are grouped together into phrases, what words modify other words, and what words are of central importance in the sentence [2], [7]. Parsing can be described as a procedure that searches through various ways of combining grammatical rules to find a combination that generates a tree representing the syntactic structure of the input sentence. Parsing uses the syntax of languages to determine the functions of words in a sentence in order to generate a data structure that can help to analyze the meaning of the sentence [7]. In addition to this, parsing deals with a number of subproblems such as identifying constituents that can fit together. In general, parsing assists to understand how words are put together to form the correct phrases or sentence along with the structural roles of the words, and it plays a significant role in many NLP applications as it helps to reduce the overall structural complexity of sentences [13]. Some of the NLP applications where parser is used as a component are A. Gelbukh (Ed.): CICLing 2014, Part I, LNCS 8403, pp. 297–306, 2014. © Springer-Verlag Berlin Heidelberg 2014

298

A. Ibrahim and Y. Assabie

semantic analysis, grammar checking, automatic abstracting, text summarization, machine translation, etc. Over the years, many algorithms have been proposed to deal with parsing and they can be broadly classified in to two as top-down and bottom-up parsing. Top-down parsing starts with the sentence and then applies the grammar