Parallel and Distributed Data Mining in Cloud

The paper describes the approach for a distributed execution of data mining algorithms and using this approach for building a Cloud for Data Mining. The suggested approach allows us to execute data mining algorithms in different parallel and distributed e

PDF / 806,716 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
90 Downloads / 229 Views

DOWNLOAD

REPORT

stract. The paper describes the approach for a distributed execution of data mining algorithms and using this approach for building a Cloud for Data Mining. The suggested approach allows us to execute data mining algorithms in different parallel and distributed environments. Thus, the created Cloud for Data Mining can be used as an analytic service and a platform for research and debugging parallel and distributed data mining algorithms. Keywords: Data mining Big data Parallel algorithms Cloud for Data Mining

Cloud computing

1 Introduction We can observe rapid growth of data volume at present. Data are collected in information systems, generated by different devices, saved as logs of computer system’s work, etc. Modern data warehouses provide storage of large amounts of different data. The terms “Big data”, “Internet of Things”, and “Cloud computing” have recently become very popular. They mean technologies for collecting, storing, and handling the large volumes of data, with a variety of types and a high velocity of generation (Big data). However, all this is worthless if we do not analyze and do not obtain new knowledge from the data. The technologies like Machine Learning, Data Mining, and Knowledge Discovery are used for discovering new knowledge in data. They use complex mathematical methods and algorithms that need powerful computing resources for analyzing Big data. Cloud and cluster technologies provide unlimited (scalable) resources. Integration of data mining and cloud computing technologies is very important. The result of this integration is a creation of Cloud for Data Mining (CDM). This solution has a number of advantages: – – – –

users always have the latest version of the algorithm; algorithms can use all the computational resources available in the “cloud”; algorithms can be applied to the data stored in the “cloud” and outside of it; users user can forget about scaling algorithms.

© Springer International Publishing Switzerland 2016 P. Perner (Ed.): ICDM 2016, LNAI 9728, pp. 349–362, 2016. DOI: 10.1007/978-3-319-41561-1_26

350

I. Kholod et al.

The CDM should have the following features for a comfortable work of an analyst: – – – – – –

usability of a multiuser interface; execution of a full cycle of analysis; access to inner and outer data sources; work with different data sources; using of all computing resources; a wide range of data mining algorithms and others.

Additionally, the CDM must provide the following capabilities for data mining researchers and developers: – uniﬁed API; – ability to add new data mining algorithms; – a wide range of parallel and distributed environments. The paper describes an approach and architecture of the CDM that has these capabilities. The paper is organized as follows. The next section is a review of similar the CDM systems. The Sect. 3 contains the description of a general approach that allows mapping of the algorithm decomposed into blocks on different distributed systems. The Sect. 4 describes the CDM architecture. The last section discusses ex

Data Loading...

Parallel and Distributed Data Mining in Cloud

Recommend Documents

Parallel and Distributed Data Warehouses

Large-Scale Parallel Data Mining

Data Mining in Cloud Computing: Survey

Web Data Mining Based on Cloud Computing

Parallel Mining of Partial Periodic Itemsets in Big Data

Techniques and Environments for Big Data Analysis Parallel, Cloud, a

Cloud-Based Massive Electricity Data Mining and Consumption Pattern Discovery

Parallel and Distributed Computational Intelligence

Distributed and Parallel Database Design

Parallel Distributed Processing

Information Retrieval and Mining in Distributed Environments

Parallel and Distributed Computing, Applications and Technologies