Attribute Reduction Based on MapReduce Model and Discernibility Measure

This paper discusses two important problems of data reduction. The problems are related to computing reducts and core in rough sets. The authors use the fact that the necessary information about discernibility matrices can be computed directly from data t

PDF / 748,171 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
90 Downloads / 199 Views

DOWNLOAD

REPORT

1

· MapReduce · Reducts · Attribute reduction ·

Introduction

Since the massive data could be stored in cloud platforms, data mining for the large datasets is hot topic. Parallel methods of computing are alternative for large datasets processing and knowledge discovery for large data. MapReduce is a distributed programming model, proposed by Google for processing large datasets, so called Big Data. Users specify the required functions Map and Reduce and optional function Combine. Every step of computation takes as input pairs < key, values > and produces another output pairs < key , values >. In the ﬁrst step, the Map function reads the input as a set < key, values > pairs and applies user deﬁned function to each pair. The result is a second set of the intermediate pairs < key , values >, sent to Combine or Reduce function. Combine function is a local Reduce, which can help to reduce ﬁnal computation. It applies second user deﬁned function to each intermediate key with all its associated values to merge and group data. Results are sorted, shuﬄed and sent to the Reduce function. Reduce function merges and groups all values to each key and produces zero or more outputs. Rough set theory is mathematical tool for dealing with incomplete and uncertain information [6]. In the decision systems, not all of the attributes are needed in decision making process. Some of them can be removed without aﬀecting the classiﬁcation quality, in this sense they are superﬂuous. One of the advantage of c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved K. Saeed and W. Homenda (Eds.): CISIM 2016, LNCS 9842, pp. 55–66, 2016. DOI: 10.1007/978-3-319-45378-1 6

56

M. Czolombitko and J. Stepaniuk

rough set theory is an ability to compute the reductions of the set of conditional attributes, so called reducts. In recent years, there has been some research works combining MapReduce and rough set theory. In [12] parallel method for computing rough set approximations was proposed. The authors continued their work and proposed in [13] three strategies based on MapReduce to compute approximations in incomplete information systems. In [11] method for computing core based on ﬁnding positive region was proposed. They also presented parallel algorithm of attribute reduction in [10]. However authors used model MapReduce only for splitting data set and parallelization computation using one of traditional reduction algorithm. In [4] is proposed a design of a Patient-customized Healthcare System based on the Hadoop with Text Mining for an eﬃcient Disease Management and Prediction. In this paper we propose a parallel method MRCR (MapReduce Core and Reduct Generation) for generating core and one reduct or superreduct based on distributed programming model MapReduce and rough set theory. In order to reduce the memory complexity, instead of discernibility matrix were used counting tables to compute discernibility measure of the datasets. The results of

Data Loading...

Attribute Reduction Based on MapReduce Model and Discernibility Measure

Recommend Documents

Roughness measure based on description ability for attribute reduction in information system

Intrusion Detection Based on Rough-Set Attribute Reduction

A discernibility matrix for the topological reduction

Feature selection based on maximal neighborhood discernibility

Green Master Based on MapReduce Cluster

Tuple Measure Model Based on CFI-Apriori Algorithm

Hybrid Computational Intelligent Attribute Reduction System, Based on Fuzzy Entropy and Ant Colony Optimization

MapReduce

Attribute Reduction of Lattice-Value Information System Based on L-Dependence Spaces

Attribute Reduction Based on Equivalence Classes with Multiple Decision Values in Rough Set

An incremental attribute reduction approach based on knowledge granularity for incomplete decision systems

Attribute reduction via local conditional entropy