A new graph-based extractive text summarization using keywords or topic modeling

  • PDF / 3,711,948 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 41 Downloads / 204 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

A new graph‑based extractive text summarization using keywords or topic modeling Ramesh Chandra Belwal1   · Sawan Rai1 · Atul Gupta1 Received: 20 April 2020 / Accepted: 3 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In graph-based extractive text summarization techniques, the weight assigned to the edges of the graph is the crucial parameter for the sentence ranking. The weights associated with the edges are based on the similarity between sentences (nodes). Most of the graph-based techniques use the common words based similarity measure to assign the weight. In this paper, we propose a new graph-based summarization technique, which, besides taking into account the similarity among the individual sentences, also considers the similarity between the sentences and the overall (input) document. While assigning the weight among the edges of the graph, we consider two attributes. The first attribute is the similarity among the nodes, which forms the edges of the graph. The second attribute is the weight given to a component that represents how much the particular edge is similar to the topics of the overall document for which we incorporate the topic modeling. Along with these modifications, we use the semantic measure to find the similarity among the nodes. The evaluation results of the proposed method demonstrate a significant improvement of the summary quality over the existing text summarization techniques. Keywords  Text summarization · Extractive summarization · Graph-based · Topic-based · Similarity measure

1 Introduction The amount of the data available on the Internet has achieved such an enormous volume; it is infeasible for human beings to extract useful information within the desired time. Without summaries, it is impractical for the users to read the vast information which is available online (Saggion and Poibeau 2013). Hence we need a method through which we can have the essence of the large text effectively at the desired time. The text summarization is the method of creating the compressed or shorter version of a given text document that contains useful information for the users. The fundamental aim of the text summarization is to reduce the content

* Ramesh Chandra Belwal [email protected] Sawan Rai [email protected] Atul Gupta [email protected] 1



Department of Computer Science and Engineering, Indian Institute of Information Technology Design and Manufacturing, Jabalpur, India

and size of the given text to its important points (Alterman 1991). Using computer algorithms, the summarization methods produce a summary of given text while retaining the original meaning (Mirshojaee et al. 2020). The text summarization can be categorized based on various parameters. On the basis of the output type, the summarization can be either abstractive or extractive. The extractive summarizers produce the summaries by selecting a few relevant or important sentences from the original document. In the abstractive summarization, the summary is gener