Parallel knowledge acquisition algorithms for big data using MapReduce

PDF / 3,851,401 Bytes
15 Pages / 595.276 x 790.866 pts Page_size
90 Downloads / 366 Views

ORIGINAL ARTICLE

Parallel knowledge acquisition algorithms for big data using MapReduce Jin Qian1,2 · Min Xia2 · Xiaodong Yue3

Received: 16 February 2016 / Accepted: 7 December 2016 © Springer-Verlag Berlin Heidelberg 2017

Abstract With the volume of data growing at an unprecedented rate, knowledge acquisition for big data has become a new challenge. To address this issue, information granules in different hierarchical decision tables are constructed. The quantitative measure changes of the support, confidence and coverage associated with hierarchical decision rules are further discussed to explain these relationships between the condition granules and decision granule. Four different strategies for attribute level ascension are designed. With attribute level ascension, the number of decision rules may be reduced in most cases. An efficient parallel knowledge acquisition framework using MapReduce for big data is proposed and implemented. The experimental results demonstrate that the proposed algorithms can mine hierarchical decision rules under different levels of granularity for big data.

This is an extended version of the paper presented at the 2015 IEEE International Conference on Machine Learning and Cybernetics, Guangzhou, China. * Jin Qian [email protected] Min Xia [email protected] Xiaodong Yue [email protected] 1

School of Computer Engineering, Jiangsu University of Technology, Changzhou 213001, China

2

Jiangsu Key Laboratory of Big Data Analysis Technology/ B‑DAT, Nanjing University of Information Science and Technology, Nanjing 210044, China

3

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

Keywords Granular computing · Rough sets · Knowledge acquisition · MapReduce · Big data

1 Introduction Granular computing (GrC) [1, 26, 30] is an emerging computing paradigm which concerns problem solving and information processing at multiple levels of knowledge granularity. Granular computing represents data and models in the form of granules and thereby provides the hierarchical views and flexible solutions of complex-structured problems. As an important model in granular computing, rough set theory [29] has been widely applied in data mining, machine learning and image processing [18, 37, 44]. Formulating the lower and upper approximations of data with rough sets, rule-based knowledge acquisition can be performed through attribute reduction. Attribute reduction is used to select significant features to symbolic data, which is often carried out as a preprocessing step in knowledge acquisition. Specifically, attribute reduction aims to find a minimum subset of attributes of the same descriptive or classification ability as the universal attributes. Based on the selected attributes, i.e. attribute reducts, the concise rules can be produced. Various kinds of attribute reduction algorithms have been proposed for different decision systems [5, 14, 16, 22, 40–42, 48, 52, 56]. Generally speaking, the attribute reduction in these algorithms is implemented

Data Loading...

Parallel knowledge acquisition algorithms for big data using MapReduce

Recommend Documents

A survey on parallel clustering algorithms for Big Data

Big Data Clustering Using MapReduce Framework: A Review

Knowledge process of health big data using MapReduce-based associative mining

Parallel Bat Algorithm-Based Clustering Using MapReduce

The Big Data Approach Using Bio-Inspired Algorithms: Data Imputation

MapReduce Hadoop Models for Distributed Neural Network Processing of Big Data Using Cloud Services

MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease pred

Data Mining and Knowledge Discovery for Big Data Methodologies, Chal

Sharing Knowledge in Digital Ecosystems Using Semantic Multimedia Big Data

Knowledge acquisition

Techniques and Environments for Big Data Analysis Parallel, Cloud, a

Big Data Analytics: Systems, Algorithms, Applications