A class imbalance-aware review rating prediction using hybrid sampling and ensemble learning

  • PDF / 1,133,521 Bytes
  • 28 Pages / 439.642 x 666.49 pts Page_size
  • 11 Downloads / 178 Views

DOWNLOAD

REPORT


A class imbalance-aware review rating prediction using hybrid sampling and ensemble learning Anbazhagan Mahadevan1 · Michael Arock2 Received: 23 July 2019 / Revised: 28 September 2020 / Accepted: 6 October 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Imbalanced distribution of instances across the classes is a challenging issue when the underlying problem is of type classification. The reason is that classifiers will tend to favor the classes with a large number of instances i.e. instances of minority classes may be identified as instances of majority classes by the classifiers. In recent years, plenty of researches have been done to resolve the class imbalance issue in binary classification problems which resulted in many class imbalance learning techniques for binary classification problems. But, the class imbalance in multi-class classification problems did not draw much attention from the research community. Unlike binary class imbalance learning, multi-class imbalance learning techniques experience more than one majority class and more than one minority class. This paper tries to come up with a multi-class imbalanced learning technique that can overcome the effects of multi-class imbalance problem in review rating prediction tasks. The proposed model handles the multi-class imbalance issue by using the combination of hybrid sampling and ensemble learning techniques. Sampling techniques such as Random Under Sampling (RUS) and Synthetic Minority Oversampling TEchnique(SMOTE) are jointly used in the proposed model to create balanced training sets for base learners. Also, the proposed model creates a powerful ensemble structure by amalgamating a manually created bagging ensemble and AdaBoost boosting ensembles. Experiments are done using the Amazon product dataset in order to investigate the performance of the proposed model. The experimental results show that the proposed Class Imbalance-Aware Review rating prediction(CIAR) model outperforms almost all the baseline models in-terms of G-mean, F-Score, and ROC AUC Score. Keywords Class imbalanced learning · Ensemble learning · Machine learning classification · Over sampling · Under sampling · Review rating prediction

 Anbazhagan Mahadevan

[email protected] 1

Department of Computer Science and Engineering, Madanapalle Institute of Technology and Science, Chittoor, Andhra Pradesh, India

2

Department of Computer Applications, National Institute of Technology, Trichy, Tamil Nadu, India

Multimedia Tools and Applications

1 Introduction E-Commerce websites have made our lives easier and convenient by allowing us to buy/sell anything with few clicks right from a living room. With the advent of Smart-phones and wireless Internet connectivity, E-Commerce websites have entirely changed the way we buy/sell products. The conventional “Word-of-Mouth Recommendations” are no longer in use by people, rather users’ experience about a product that is expressed in the form of text reviews are used. Reviews can help the sellers sel