An efficient ACO-PSO-based framework for data classification and preprocessing in big data

  • PDF / 1,164,417 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 32 Downloads / 236 Views

DOWNLOAD

REPORT


SPECIAL ISSUE

An efficient ACO‑PSO‑based framework for data classification and preprocessing in big data Ashutosh Kumar Dubey1 · Abhishek Kumar1 · Rashmi Agrawal2 Received: 23 May 2020 / Revised: 11 August 2020 / Accepted: 22 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Big data is prominent for the systematic extraction and analysis of a huge or complex dataset. It is also helpful in the management of data as compared to the traditional data-processing mechanisms. In this paper, an efficient ant colony optimization (ACO) and particle swarm optimization (PSO)-based framework have been proposed for data classification and preprocessing in the big data environment. It shows that the content part can be collaborated and fetched for analysis from the volume and velocity integration. Then weight marking has been done through the volume and the data variety. In the end, the ranking has been done through the velocity and variety aspects of big data. Data preprocessing has been performed from weights assigned on the basis of size, content, and keywords. ACO and PSO are then applied considering different computation aspects like uniform distribution, random initialization, epochs, iterations, and time constraint in case of both minimization and maximization. The weight assignments have been done automatically and through an unbiased random mechanism. It has been done on a scale of 0–1 for all the separated data. Then simple adaptive weight (SAW) method has been applied for prioritization and ranking. The overall average classification accuracy obtained in the case of PSO-SAW is 98%, and in the case of ACO-SAW, it is 95%. PSO-SAW approach outperforms in all cases, in comparison to ACO-SAW. Keywords  ACO · PSO · SAW · Big data · Content based classification

1 Introduction The term big data refers to a large volume of data [1]. It can be structured and unstructured. In terms of data processing, the main important thing is the organization that uses those data [2, 3]. In today’s world, it has been used widely to outperform their peers. It can be better defined as volume, velocity, and variety. The amount of data generated is the volume; generation speed of the data is the velocity [4–6], and the structured and unstructured aspects are called a variety of data. The major benefits of big data are large data processing, and time and cost saving, in analysis and forecasting, and efficiency due to the advanced tools support [7].

* Ashutosh Kumar Dubey [email protected] 1



Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India



Faculty of Computer Applications, Manav Rachna International Institute of Research and Studies, Faridabad, India

2

In 2018, Sternberg et al. [8] discussed the big social data. They investigated the Turkish Airlines in terms of Facebook page improvement. Their results show weak relationships in terms of business analytics and Facebook data. In 2018, Mande et al. [9] discussed the large volumes of heterogeneous