On the Merge of Factor Canonical Bases

Formal concept analysis (FCA) has a significant appeal as a formal framework for knowledge discovery not least because of the mathematical tools it provides for a range of data manipulations such as splits and merges. We study the computation of the canon

PDF / 599,027 Bytes
17 Pages / 430 x 660 pts Page_size
2 Downloads / 202 Views

DOWNLOAD

REPORT

` D´epartement d’informatique, UQAM, C.P. 8888, Succ. CV, Montr´eal (Qc), Canada 2 CNRS - UMR 7090 - ECP6, 175, rue Chevaleret, 75013 Paris, France

Abstract. Formal concept analysis (FCA) has a significant appeal as a formal framework for knowledge discovery not least because of the mathematical tools it provides for a range of data manipulations such as splits and merges. We study the computation of the canonical basis of a context starting from the bases of two apposed subcontexts, called factors. Improving on a previous method of ours, we provide here a deeper insight into its pivotal implication family and show it represents a relative basis. Further structural results allow for more efficient computation of the global basis, in particular, the relative one admits, once added to factor bases, an inexpensive reduction. A method implementing the approach as well as a set of further combinatorial optimizations is shown to outperform NextClosure on at least one dataset.

1 Introduction Since the early eighties, formal concept analysis (FCA) [10] has developed as a series of technics for representing and structuring qualitative data. Albeit rooted in lattice theory [3], it has produced a strong impact in data analysis and later on in data mining, generating a large body of work on closed-itemset representations [16, 25, 17] and association rule bases [19, 24], within the association rule mining (ARM) discipline [1]. This impact is a direct consequence of the double concern within FCA with both the concept systems (formalized as lattices) and the implication families stemming out of a tableau-shaped binary relation (formal context). The duality between lattices (equivalently in the finite case semilattices/closure operators, etc.) and implications (also called rules/dependencies) can be simply spelled: in extracting a semilattice out of a Boolean lattice, the more elements one removes, the more implications are generated and vice versa. Hence semilattices and implicational systems represent two sides of a same reality indicating, roughly speaking, what is existing/missing in the data. The relationship between lattices and implications has always been a key concern in FCA [23]. Noteworthily, this duality is known and used in the theory of relational databases [5] and in artificial intelligence [13, 12]. For instance, in database normalization theory, the computation of a minimal cover of a family of functional dependencies is a major task whose resolution makes extensive use of that duality [14, 15]. Usually, when dealing with combinatorial objects, a first concern is to put them in some “canonical” form. For a lattice such a form is the reduced standard context that only comprises irreducible intents from the semi-lattice. The implicational system can in turn be reduced to a basis of minimal size whereby the canonical basis [11] (called R. Medina and S. Obiedkov (Eds.): ICFCA 2008, LNAI 4933, pp. 182–198, 2008. c Springer-Verlag Berlin Heidelberg 2008

On the Merge of Factor Canonical Bases

183

Duquenne-Guigues

Data Loading...

On the Merge of Factor Canonical Bases

Recommend Documents

On the canonical base property

The p-Canonical Basis

Construction of Janet Bases II. Polynomial Bases

Merge Join

Merge Designs

Merge-purge

Construction of Janet Bases I. Monomial Bases

On the Hypotheses Which Lie at the Bases of Geometry

On the Complexity of Computing Integral Bases of Function Fields

Triplet Merge Trees

Canonical Correlation

Sort-Merge Join