Interpreting GUHA Data Mining Logic in Paraconsistent Fuzzy Logic Framework

A natural interpretation of GUHA style data mining logic in paraconsistent fuzzy logic framework is introduced. Significance of this interpretation is discussed.

  • PDF / 157,869 Bytes
  • 10 Pages / 430 x 660 pts Page_size
  • 11 Downloads / 213 Views

DOWNLOAD

REPORT


Abstract. A natural interpretation of GUHA style data mining logic in paraconsistent fuzzy logic framework is introduced. Significance of this interpretation is discussed. Keywords: Data mining, fuzzy logic, paraconsistent logic.

1

Introduction

Classical Boolean logic is the logic of mathematics. In pure mathematical world things are binary: either a number is a prime number or is not, either a theorem is proved or not, tertium non datur is valid. Outside mathematics in the real world, in data analysis and decision making, however, applying Boolean logic causes anomalies: the law of the excluded middle is problematic, the use of classical quantifiers ∀ (for all) and ∃ (there exists) is clumsy and truth and falsehood need not to be each others complements. To overcome these problems several non–classical logics were born. In various many–valued logics such as mathematical fuzzy logic [16] the law of the excluded middle does not hold in general, in GUHA data mining logic [3] there are several non–classical quantifiers e.g. ’in most cases’, ’above average’ etc, and in paraconsistent logic [2], besides true or false, a statement can be unknown or contradictory, too. In this paper we explore the mutual relation of these non–classical logics. We show, in particular, how GUHA logic is related to paraconsistent fuzzy logic.

2

The GUHA Method in Data Mining

GUHA - General Unary Hypotheses Automaton - introduced in [3] and still developing, is a method of automatic generation of hypotheses based on empirical data, thus a method of data mining. GUHA is a kind of automated exploratory data analysis: it generates systematically hypotheses supported by the data. The GUHA method is based on well–defined first order monadic logic containing generalized quantifiers on finite models. A GUHA procedure generates statements on association between complex Boolean attributes. These attributes are constructed from the predicates corresponding to the columns of the data matrix. F. Rossi and A. Tsoukis (Eds.): ADT 2009, LNAI 5783, pp. 284–293, 2009. c Springer-Verlag Berlin Heidelberg 2009 

Interpreting GUHA Data Mining Logic

285

GUHA is primary suitable for exploratory analysis of large data. The processed data forms a rectangle matrix, where rows correspond to objects belonging to the sample and each column corresponds to one investigated variable. A typical data matrix processed by GUHA has hundreds or thousands of rows and tens of columns. Exploratory analysis means that there is no single specific hypothesis that should be tested by our data; rather, the aim is to get orientation in the domain of investigation, analyze the behavior of chosen variables, interactions among them etc. Such inquiry is not blind but directed by some general direction of research. GUHA is not suitable for testing a single hypothesis: routine packages are good for this. GUHA systematically creates all hypotheses interesting from the point of view of a given general problem and on the base of given data. This is the main principle: all interesting hypothese