Digital media news categorization using Bernoulli document model for web content convergence

  • PDF / 3,539,399 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 74 Downloads / 162 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Digital media news categorization using Bernoulli document model for web content convergence Pradeep Kumar Mallick 1 & Sushruta Mishra 1

&

Gyoo- Soo Chae 2

Received: 28 May 2020 / Accepted: 9 September 2020 # Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract There are multiple distinct sources through which numerous news contents that occur in digital medium tend to converge. Web contents constitute massive number of features. Complete coverage of all kinds of news is absolutely vital to retain customer confidence and to have a competitive edge over other news agencies. Aggregating such massive news content from different heterogeneous sources requires an integration of convergent computing. Classification of these online news is a challenging task in the age of Internet where news keeps flowing from several heterogeneous sources. Due to constant rise in manipulation of web contents, accurate classification of digital news is the need of the hour. Precise detection of specific news into their respective class is a major challenge in recent times. In this scenario, the need of an automated predictive-based approach can be of great use in effective organization and classification of news in a pool of web portals. This research study comprises the application of Bernoulli model to determine the effectiveness of multi-class digital news categorization that arrive in real time. The system model presented in this analysis was evaluated using python, and the result was demonstrated using six distinct classes of news with a 6000 feature size dataset from TagMyNews dataset. The classification accuracy using the Bernoulli model was computed to be 98.4%, while the evaluated precision metric was 92.7%, and recall value was 90.6%. The F-Score metric generated an optimum value of 91.4%. The execution time for Bernoulli model was only 12 s. The computed result using Bernoulli model was compared with some other related renowned existing works and the results generated by Bernoulli model gave optimum performance and the news classification efficiency is highly enhanced. Keywords Text classification . Bernoulli model . News categorization . Precision . Recall . Accuracy rate

1 Introduction Massive textual-based online data in the form of digital libraries, data repositories, and social network analysis like e-mails and blogs are available [1]. Inflow of textual data occurs from

* Pradeep Kumar Mallick [email protected] * Sushruta Mishra [email protected] Gyoo- Soo Chae [email protected] 1

School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT) Deemed to be University, Bhubaneswar, Odisha, India

2

Division of Information & Communication Engineering, Baekseok University, Cheonan 330-704, South Korea

wireless medium, cloud storage, mobile devices, other ubiquitous devices, and many other intelligent systems. Gathering and aggregating these data features is a challenging task. With each passing day, the rate of information intake is exponentially risin