An Automatic Text Summarization on Naive Bayes Classifier Using Latent Semantic Analysis

Currently, huge information is available on Internet, but it is difficult to find the relevant information at a fast and efficient rate. Large collection of textual data is available on the Internet. A very competent system is required to find the most ap

  • PDF / 400,166 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 17 Downloads / 228 Views

DOWNLOAD

REPORT


1 Introduction Automatic text summarization is a process to reduce the text in a system and to generate a good summary [1]. Natural language processing and machine learning are the major areas for text summarization. This paper presents the idea of text summarization on basis of text extraction. Automatic text summarization is the process of creating a shorter version of a given source such as news article, which retains the main information of source, so user can understand the concept of article. There are two types of summarization, namely abstractive and extractive summarization. In extractive summarization, the process involves using specific techniques to select sentences which have the highest score in the retrieved document and consolidating all the extractive sentences on the basis of score and building the summary, whereas in the abstractive summary, the original text is converted into a similar semantic form with the help of linguistic methods to get a comprehensive version of the original document. Latent semantic analysis is used for the dimension reduction, and once the summary is ready, Naïve Bayes classifier is used for training the model. The summary is predicted once the model is ready. The primary step of text summarization is to identify the important features. After that, preprocessing that includes sentence segmentation, tokenization, stop word removal, and stemming plays a major role. After performing preprocessing step and once the corpus is ready, the document matrix and tf-idf matrix are built on corpus for each term. Singular-value decomposition (SVD) is applied to tf-idf matrix and different types of concept are ready. In concept, on basis of threshold, value given C. Shah (B) Shankersinh Vaghela Bapu Institute of Technology, Gandhinagar, India e-mail: [email protected] A. Jivani The Maharaja Sayajirao University of Baroda, Vadodara, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2019 R. K. Shukla et al. (eds.), Data, Engineering and Applications, https://doi.org/10.1007/978-981-13-6347-4_16

171

172

C. Shah and A. Jivani

for prediction is either 1 (part of summary) or 0 (not part of the summary). There are many concepts present, but since not all are useful, recursive feature elimination techniques to select only the important concepts are used in the present study. Naïve Bayes theorem is applied on the selected concepts for training and predicting the summary, which is generated on the basis SVD calculation. Section 2 of the paper describes the related work done on automatic text summarization. Section 3 shows the proposed method for the improvement of text summarization, and Sect. 4 presents the results and discussion. Conclusions are covered in Sect. 5.

2 Related Work Lunh (1958) [2] introduced first text summarization based on term frequency. In 1969, automatic text summarization had used some standard methods to assign sentence weights. Such methods are cue, title, and location method. In the early 1990s, the machine learning techniques were