Assessment of Text Coherence by Constructing the Graph of Semantic, Lexical, and Grammatical Consistancy of Phrases of S
- PDF / 156,844 Bytes
- 7 Pages / 594 x 792 pts Page_size
- 8 Downloads / 191 Views
ASSESSMENT OF TEXT COHERENCE BY CONSTRUCTING THE GRAPH OF SEMANTIC, LEXICAL, AND GRAMMATICAL CONSISTANCY OF PHRASES OF SENTENCES S. D. Pogorilyy1† and A. A. Kramov1‡
UDC 004.83
Abstract. The graph-based method of coherence assessment of texts based on the analysis of semantic, grammatical, and lexical consistency of sentence phrases has been suggested. The experimental verification of the efficiency of the method has been performed on the English-language corpora. The metrics obtained can indicate that the suggested method outperforms other modern approaches. The method can be applied to other languages by replacing the linguistic models according to the features of a certain language. Keywords: natural language processing, assessment of text coherence, bipartite graph of phrases, graph-based method of coherence assessment of texts, lexical and grammatical consistency of sentences.
INTRODUCTION Natural language processing (NLP) is one of the investigation directions in the field of artificial intelligence. To solve most of the problems of natural language processing, human recourses (expert knowledge) have to be utilized, i.e., the problems of this type cannot be solved with the help of a certain algorithm. Language synthesis and recognition, syntax analysis, plagiarism detection, sentiment analysis, etc., should be considered as a part of this problem class. Thus, the formalization problem of natural language texts arises, as well as pattern identification between their components according to the expected output results. In the light of a steady increase of the computing resource power, different combinational methods of machine learning and computer linguistics [1] are utilized in solving the corresponding problems. Therefore, there exist a possibility of a model learning over the already formed corpora (body of textual information), and its subsequent application in a test sample. However, the inhomogeneity of the textual information (different structure, sentence length, and semantic dependence of the following sentences on the preceding ones) and its content variety complicate the planning and parameter calculation for the machine learning models; therefore, solving the natural language processing problems that analyze the semantic and grammatical text features has been and still is currently important. Text coherence assessment belongs to this problem type. According to the definition [2], text coherence is understood as a grammatical and lexical connection between the text components. Text coherence ensures a sequential transmission of the main idea to the reader within the framework of the text, making it more understandable and easier to perceive. This communicative connection between the writer and the reader is achieved by the semantic integrity. Another criterion of the text coherence is the presence of structural consistency between its elements (sentences and phrases). Verification of the adherence level to the 1
Taras Shevchenko National University of Kyiv, Kyiv, Ukraine, †[email protected]; ‡artemkram
Data Loading...