An application of MOGW optimization for feature selection in text classification

  • PDF / 1,839,338 Bytes
  • 34 Pages / 439.37 x 666.142 pts Page_size
  • 2 Downloads / 259 Views

DOWNLOAD

REPORT


An application of MOGW optimization for feature selection in text classification Razieh Asgarnezhad1 · S. Amirhassan Monadjemi2   · Mohammadreza Soltanaghaei1 Accepted: 23 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Due to extensive web applications, sentiment classification (SC) has become a relevant issue of interest among text mining experts. The extensive online reviews prevent the application of effective models to be used in companies and in the decision making of individuals. Pre-processing greatly contributes in sentiment classification. The traditional bag-of-words approaches do not record multiple relationships among words. In this study, emphasis is on the pre-processing stage and data reduction techniques, which would make a big difference in sentiment classification efficiency. To classify opinions, a multi-objective-grey wolf-optimization algorithm is proposed where the two objectives aim for decreasing the error of Naïve Bayes and K-nearest neighbour classifiers and a neural network as the final classifier. In evaluating this proposed framework, three datasets are applied. By obtaining 95.76% precision, 95.75% accuracy, 95.99% recall, and 95.82% f-measure, it is evident that this framework outperforms its counterparts. Keywords  Sentiment classification · Feature selection · Multi-objective-grey wolfoptimization · Naïve bayes · K-nearest neighbour · Multi-layer neural network

* S. Amirhassan Monadjemi [email protected] Razieh Asgarnezhad [email protected] Mohammadreza Soltanaghaei [email protected] 1

Department of Computer Engineering, Isfahan (Khorasgan) Branch, Islamic Azad University, Isfahan, Iran

2

Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran



13

Vol.:(0123456789)



R. Asgarnezhad et al.

1 Introduction With the explosion of information on the Internet, it is hard to make decisions based on reviews, tweets, etc. People purchase products on the Internet and immediately express their opinions. These opinions have a significant effect on the financial statements of the involved companies. The main problem in this process is the nature of the natural language of the expressed opinions. There exists a big gap between opinions in natural language (i.e., unstructured data) and where structured data applications are applied [1]. The knowledge stored as text, documents, video, and voice media formats exceeds 80% of its volume. In the field of computer science, these documents have an unstructured nature. In knowledge extraction, realization is must before searching the implicit meanings and concepts. Idea mining in any text is attributed to the technical phase of what humans can search for. Keywords are the keys sought by the search engines in finding text data, based on the probable presented facts, not ideas. Expressing ideas through keywords is impossible [2]. Sentiment classification (SC) is an appealing field in text mining. The extracted opinions from the unstructured data on the Internet become class