Data Preprocessing and Data Mining as Generalization

We present here an abstract model in which data preprocessing and data mining proper stages of the Data Mining process are are described as two different types of generalization. In the model the data mining and data preprocessing algorithms are defined a

PDF / 253,327 Bytes
16 Pages / 439 x 666 pts Page_size
91 Downloads / 392 Views

DOWNLOAD

REPORT

2

Department of Computer Science, State University of New York, Stony Brook, NY, USA [email protected] Departamento de Lenguajes y Sistemas Informaticos Facultad de Informatica, U.P.M, Madrid, Spain [email protected]

Summary. We present here an abstract model in which data preprocessing and data mining proper stages of the Data Mining process are are described as two different types of generalization. In the model the data mining and data preprocessing algorithms are deﬁned as certain generalization operators. We use our framework to show that only three Data Mining operators: classiﬁcation, clustering, and association operator are needed to express all Data Mining algorithms for classiﬁcation, clustering, and association, respectively. We also are able to show formally that the generalization that occurs in the preprocessing stage is diﬀerent from the generalization inherent to the data mining proper stage.

1 Introduction We build models in order to be able to address formally intuitively expressed notions, or answer intuitively formulated questions. We say for example, that Data Mining generalizes the data by transforming them into a more general information. But what in fact is a generalization? When a transformation of data is, and when is not a generalization? How one kind of generalization diﬀers from the other? The model presented here addresses and answers, even if partially these questions. There are many data mining algorithms and thousands of implementations. A natural questions arise: why very diﬀerent algorithms are all called, for example, the classiﬁcation algorithms? What do they have in common? How do they diﬀer from other algorithms? We hence build our model in such a way as to be able to deﬁne characteristics common to one type of algorithms, and not to the other types. We present here three models: generalization model (Deﬁnition 1) and its particular cases, data mining model (Deﬁnition 12), and preprocessing model (Deﬁnition 26). A. Wasilewska and E. Menasalvas: Data Preprocessing and Data Mining as Generalization, Studies in Computational Intelligence (SCI) 118, 469–484 (2008) c Springer-Verlag Berlin Heidelberg 2008 www.springerlink.com

470

A. Wasilewska and E. Menasalvas

In data mining model each class of data mining algorithms is represented by an operator. Theses operators are also generalization operators of the generalization model, i.e. they capture formally the intuitive notion of information generalization. Moreover, we show that (Theorem 4) all operators belonging to one category are distinctive with other categories. The generalization model presented here is an extension of the model presented in [16], preliminary version of the data mining model and preprocessing models was presented in [13, 14], respectively. We usually view Data Mining results and present them to the user in their descriptive form as it is the most natural form of communication. But the Data Mining process is deeply semantical in its nature. The algorithms process records (semantics) ﬁnding similarities

Data Loading...

Data Preprocessing and Data Mining as Generalization

Recommend Documents

Data Preprocessing

Big Data Analytics and Preprocessing

Robust Techniques for Data Preprocessing

Data Quality Visualization for Preprocessing

Advanced Data Preprocessing and Feature Engineering

Imbalanced Data Stream Classification Using Hybrid Data Preprocessing

Data Mining

Data Mining

Data Mining

Data Mining

Data Mining

Data Mining