Arabic text summarization using deep learning approach
- PDF / 2,001,173 Bytes
- 17 Pages / 595.276 x 790.866 pts Page_size
- 40 Downloads / 375 Views
n Access
RESEARCH
Arabic text summarization using deep learning approach Molham Al‑Maleh1* and Said Desouki2 *Correspondence: [email protected] 1 Faculty of Information Technology, Higher Institute for Applied Sciences and Technology, Damascus, Syria Full list of author information is available at the end of the article
Abstract Natural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly on both input and output. Deep neural networks take advantage of big datasets to improve their results. These networks are supported by the attention mechanism, which can deal with long texts more efficiently by identifying focus points in the text. They are also supported by the copy mechanism that allows the model to copy words from the source to the summary directly. In this research, we are re-implementing the basic summarization model that applies the sequence-to-sequence framework on the Arabic language, which has not witnessed the employment of this model in the text summarization before. Initially, we build an Arabic data set of summarized article headlines. This data set consists of approximately 300 thousand entries, each consisting of an article introduction and the headline corresponding to this introduction. We then apply baseline summarization models to the previous data set and compare the results using the ROUGE scale. Keywords: Natural language processing, Text summarization, Deep learning, Big data, Sequence-to-sequence framework
Introduction The task of text summarization is one of the most important challenges that faces computer capabilities with all its new advances. This task is based on generating short text from longer text so that the short text contains the most important info of the original text. There are two basic methodologies used to summarize the texts, which are extractive summarization—from which most systems with good results came out—and abstractive summarization that simulate human summarization. The first methodology is based on determining the important parts of the text in a statistical approach like the work of Belkebir et al. [1], or in a semantic approach like the work of Imam et al. [2] on the Arabic language; then, it represents the summary by truncating these parts and linking them like what was done by Knight et al. [3] on the English language. The second methodology is based on simulating human work in summarizing, which is based on
© The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to
Data Loading...