Opinion mining with reviews summarization based on clustering

  • PDF / 448,491 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 25 Downloads / 193 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Opinion mining with reviews summarization based on clustering Shabnam Bagheri Marzijarani1 • Hedieh Sajedi2

Received: 9 May 2020 / Accepted: 25 August 2020  Bharati Vidyapeeth’s Institute of Computer Applications and Management 2020

Abstract Automatic text summarization can be used in recommendation systems to present useful texts obtained from the available comments and texts. For summarization, a human reads all of the writing and gains a background understanding of the text, but computers do differently. Several methods have been proposed for automatic text summarization until now, from abstract summarization methods that deal with new sentences produced from important points existed in the texts to extraction summarization methods, which deal with original main sentences from the text. In this study, we present an extraction method for text summarizing. In this method, at first, the sentences are processed, and the similarities between sentences are calculated by a proposed similarity measure. Afterward, the sentences are clustered based on the similarities, and at last, a certain number of sentences are selected from each cluster. The Gaussian Mixture Model (GMM) algorithm is used to cluster the sentences. The proposed method is tested on a collected dataset from Tripadvisor (https://www.tripadvisor.com/) customer

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s41870-020-00511-y) contains supplementary material, which is available to authorized users. & Hedieh Sajedi [email protected] Shabnam Bagheri Marzijarani [email protected] 1

Department of Information Technology, Faculty of Mechanics, Electrical Power and Computer, Science and Research Branch, Islamic Azad University, Tehran, Iran

2

Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, 14155-6455 Tehran, Iran

reviews, and the results show that using GMM results in a more informative summary and more variation in sentences compared to K-means. Keywords Text summarization  Gaussian mixture model  K-means  Clustering  Sentence similarity

1 Introduction One of the biggest problems that the world of technology is facing is overload, messy, contradicting, and noisy data, especially when it is in natural language form. Today’s global society is provided by the advancement of processing power, data storage, and Internet speed. Relationships between humans are globally, and borders have been broken. People can post their ideas on social networks and online forums. Over the past 5 years, more companies are closing their physical stores/locations because they are selling products and services online at a rapid rate [1]. With ever-growing reliance on online business, past customers’ online reviews come to play an important role. New customers need to research companies’ reputations through reading online reviews. It is a crucial part of webbased businesses. However, what if one product or one service has too m