Arabic text summarization using deep learning approach

PDF / 2,001,173 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
40 Downloads / 411 Views

n Access

RESEARCH

Arabic text summarization using deep learning approach Molham Al‑Maleh1* and Said Desouki2 *Correspondence: [email protected] 1 Faculty of Information Technology, Higher Institute for Applied Sciences and Technology, Damascus, Syria Full list of author information is available at the end of the article

Abstract Natural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly on both input and output. Deep neural networks take advantage of big datasets to improve their results. These networks are supported by the attention mechanism, which can deal with long texts more efficiently by identifying focus points in the text. They are also supported by the copy mechanism that allows the model to copy words from the source to the summary directly. In this research, we are re-implementing the basic summarization model that applies the sequence-to-sequence framework on the Arabic language, which has not witnessed the employment of this model in the text summarization before. Initially, we build an Arabic data set of summarized article headlines. This data set consists of approximately 300 thousand entries, each consisting of an article introduction and the headline corresponding to this introduction. We then apply baseline summarization models to the previous data set and compare the results using the ROUGE scale. Keywords: Natural language processing, Text summarization, Deep learning, Big data, Sequence-to-sequence framework

Introduction The task of text summarization is one of the most important challenges that faces computer capabilities with all its new advances. This task is based on generating short text from longer text so that the short text contains the most important info of the original text. There are two basic methodologies used to summarize the texts, which are extractive summarization—from which most systems with good results came out—and abstractive summarization that simulate human summarization. The first methodology is based on determining the important parts of the text in a statistical approach like the work of Belkebir et al. [1], or in a semantic approach like the work of Imam et al. [2] on the Arabic language; then, it represents the summary by truncating these parts and linking them like what was done by Knight et al. [3] on the English language. The second methodology is based on simulating human work in summarizing, which is based on

© The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to

Data Loading...

Arabic text summarization using deep learning approach

Recommend Documents

Automatic Arabic Text Summarization Using Analogical Proportions

Ultrasound Video Summarization Using Deep Reinforcement Learning

Text Summarization

Text Summarization

Text Summarization Challenge: An Evaluation Program for Text Summarization

SNAD Arabic Dataset for Deep Learning

Text/Document Summarization

A Transfer Learning End-to-End Arabic Text-To-Speech (TTS) Deep Architecture

Sign Language to Text Conversion Using Deep Learning

Graph-Based Multi-document Text Summarization Using NLP

An extractive text summarization approach using tagged-LDA based topic modeling

Improving Text Summarization using Ensembled Approach based on Fuzzy with LSTM