Harmony Search for Data Mining with Big Data

In this paper, some harmony search algorithms have been proposed for data mining with big data. Three areas of big data processing have been studied to apply new metaheuristics. The first problem is related to MapReduce architecture that can be supported

PDF / 1,614,699 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
91 Downloads / 268 Views

DOWNLOAD

REPORT

Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa St. 75, 00-662 Warsaw, Poland [email protected] 2 Faculty of Telecommunications, Electronics and Informatics, Gdańsk University of Technology, Narutowicza St. 11/12, 80-233 Gdańsk, Poland [email protected], [email protected]

Abstract. In this paper, some harmony search algorithms have been proposed for data mining with big data. Three areas of big data processing have been studied to apply new metaheuristics. The ﬁrst problem is related to MapReduce architecture that can be supported by a team of harmony search agents in grid infrastructure. The second dilemma involves development of harmony search in preprocessing of data series before data mining. Moreover, harmony search as a classiﬁcation algorithm is studied as the third application. Finally, some outcomes for numerical experiments are submitted.

1 Introduction A goal of this paper is to describe an approach based on harmony search (an acronym HS) for data mining with Big Data (BD). Although, there are several data mining algorithms, including back-propagation neural networks or locally weighted linear regression, we can extend these set to better support of parallelism in Big Data processing and to avoid some limitations of well-known machine learning procedures like k-Means, or support vector machines [5]. We assume that data is usually gathered from multiple sources, which may be heterogeneous and spread geographically across the world. Also, the collected data may be stored in distributed facilities. Data mining algorithms can be applied for large-scale multimedia applications, in massive parallel way, to ensure higher capacity of training. In fact, both logistic regression and Gaussian discriminant analysis can be used concurrently for different parts of images to make decision for the whole pattern. Similarly, naive Bayes or the independent variable analysis can be developed, simultaneously [12]. A motivation for this paper is fact that a quality of BD processing mainly depends on services provided by some public cloud computing platforms. Especially, BigQuery service that is delivered by Google Cloud can be applied to analyze data in the cloud by SQL-like queries. A query for BD is usually performed for multi-terabyte datasets in 1–2 s. This service is easy to use and it is scalable. In consequence, BD services provided by some public cloud computing platforms can offer real-time insights about large-scale multimedia data [11]. © IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved K. Saeed and W. Homenda (Eds.): CISIM 2016, LNCS 9842, pp. 553–565, 2016. DOI: 10.1007/978-3-319-45378-1_49

554

J. Balicki et al.

MapReduce is a batch-oriented parallel cloud computing model and it can be applied to some machine learning algorithms because they regularly prerequisite to probe through the training data. It needs exhaustive computing to entrance the large

Data Loading...

Harmony Search for Data Mining with Big Data

Recommend Documents

Mining big data in tourism

Data Mining and Knowledge Discovery for Big Data Methodologies, Chal

Pocket Data Mining Big Data on Small Devices

Predictive Analytics, Data Mining and Big Data Myths, Misconceptions

Research on Educational Data Mining Based on Big Data

Data Mining and Big Data First International Conference, DMBD 2016,

Social mining-based clustering process for big-data integration

Big Data

Big Data

Data Mining for Social Network Data

Deductive Data Mining, Model for Automated Data Mining

Data Mining for Managers How to Use Data (Big and Small) to Solve Bu