Parallel knowledge acquisition algorithms for big data using MapReduce

  • PDF / 3,851,401 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 90 Downloads / 229 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Parallel knowledge acquisition algorithms for big data using MapReduce Jin Qian1,2   · Min Xia2 · Xiaodong Yue3 

Received: 16 February 2016 / Accepted: 7 December 2016 © Springer-Verlag Berlin Heidelberg 2017

Abstract  With the volume of data growing at an unprecedented rate, knowledge acquisition for big data has become a new challenge. To address this issue, information granules in different hierarchical decision tables are constructed. The quantitative measure changes of the support, confidence and coverage associated with hierarchical decision rules are further discussed to explain these relationships between the condition granules and decision granule. Four different strategies for attribute level ascension are designed. With attribute level ascension, the number of decision rules may be reduced in most cases. An efficient parallel knowledge acquisition framework using MapReduce for big data is proposed and implemented. The experimental results demonstrate that the proposed algorithms can mine hierarchical decision rules under different levels of granularity for big data.

This is an extended version of the paper presented at the 2015 IEEE International Conference on Machine Learning and Cybernetics, Guangzhou, China. * Jin Qian [email protected] Min Xia [email protected] Xiaodong Yue [email protected] 1

School of Computer Engineering, Jiangsu University of Technology, Changzhou 213001, China

2

Jiangsu Key Laboratory of Big Data Analysis Technology/ B‑DAT, Nanjing University of Information Science and Technology, Nanjing 210044, China

3

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China





Keywords  Granular computing · Rough sets · Knowledge acquisition · MapReduce · Big data

1 Introduction Granular computing (GrC) [1, 26, 30] is an emerging computing paradigm which concerns problem solving and information processing at multiple levels of knowledge granularity. Granular computing represents data and models in the form of granules and thereby provides the hierarchical views and flexible solutions of complex-structured problems. As an important model in granular computing, rough set theory [29] has been widely applied in data mining, machine learning and image processing  [18, 37, 44]. Formulating the lower and upper approximations of data with rough sets, rule-based knowledge acquisition can be performed through attribute reduction. Attribute reduction is used to select significant features to symbolic data, which is often carried out as a preprocessing step in knowledge acquisition. Specifically, attribute reduction aims to find a minimum subset of attributes of the same descriptive or classification ability as the universal attributes. Based on the selected attributes, i.e. attribute reducts, the concise rules can be produced. Various kinds of attribute reduction algorithms have been proposed for different decision systems [5, 14, 16, 22, 40–42, 48, 52, 56]. Generally speaking, the attribute reduction in these algorithms is implemented