Performance Analysis of DE over K-Means Proposed Model of Soft Computing

In real-world data increased periodically, huge amount of data is called Big data. It is a well-known term used to define the exponential growth of data, both in structured and unstructured format. Data analysis is a method of cleaning, altering, learning

  • PDF / 198,682 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 54 Downloads / 160 Views

DOWNLOAD

REPORT


Abstract In real-world data increased periodically, huge amount of data is called Big data. It is a well-known term used to define the exponential growth of data, both in structured and unstructured format. Data analysis is a method of cleaning, altering, learning valuable statistics, decision-making, and advising assumption with the help of many algorithms and procedures such as classification and clustering. In this paper we discuss about big data analysis using soft computing technique and propose how to pair two different approaches like evolutionary algorithm and machine learning approach also try to find better cause. Keywords Big data clustering

 K-means algorithm  DE (differential evolution)  Data

1 Introduction Day-by-day amount of data generation is accumulative in drastic manner. Wherein to describe the data, for zetta byte, popular term used is “Big data.” The marvelous volume and mixture of real-world data surrounded in massive databases clearly overcome old-fashioned manual method of data analysis, such as worksheets and ad hoc inquiries. A new generation of tools and techniques with the capabilities of perceives and repeatedly promotes users in investigating elevations of data in warehouse in bits for useful knowledge. These procedures and tools are the issue of the field of Knowledge Discovery on Database (KDD), which is mining fascinating Kapil Patidar (&)  Manoj Kumar  Sushil Kumar Department of CSE, Amity School of Engineering and Technology, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] Manoj Kumar e-mail: [email protected] Sushil Kumar e-mail: [email protected] © Springer Science+Business Media Singapore 2016 M. Pant et al. (eds.), Proceedings of Fifth International Conference on Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 436, DOI 10.1007/978-981-10-0448-3_42

507

508

Kapil Patidar et al.

information or design from data in large databases [1]. As in the current situation, data mining tools are very expensive, only few companies have enough money to afford them. The techniques being used for data analysis are spontaneous cluster recognition. As doing online analysis the algorithm used should be fast, that is, in scientific duration it should not be calculated exhaustive but quite probably provide a good result, so the technique expresses worldwide finest discrete cluster. Clustering is a corporate data mining task that it has been examined for use in a number of different areas of data mining and statistics recovery. It is an important unsupervised classification technique, where set of design and frequent vector in a multidimensional space, are grouped into a cluster if pattern is same then it belongs to the same cluster but, if pattern is different, then the cluster is dissimilar. The aim of clustering techniques is to partition a heterogeneous multidimensional data set into group of more homogenous characteristics [2]. Unsupervised clustering may be generally classified into two types—‘hierarchical’ and ‘partiti