On the Merge of Factor Canonical Bases

Formal concept analysis (FCA) has a significant appeal as a formal framework for knowledge discovery not least because of the mathematical tools it provides for a range of data manipulations such as splits and merges. We study the computation of the canon

  • PDF / 599,027 Bytes
  • 17 Pages / 430 x 660 pts Page_size
  • 2 Downloads / 202 Views

DOWNLOAD

REPORT


` D´epartement d’informatique, UQAM, C.P. 8888, Succ. CV, Montr´eal (Qc), Canada 2 CNRS - UMR 7090 - ECP6, 175, rue Chevaleret, 75013 Paris, France

Abstract. Formal concept analysis (FCA) has a significant appeal as a formal framework for knowledge discovery not least because of the mathematical tools it provides for a range of data manipulations such as splits and merges. We study the computation of the canonical basis of a context starting from the bases of two apposed subcontexts, called factors. Improving on a previous method of ours, we provide here a deeper insight into its pivotal implication family and show it represents a relative basis. Further structural results allow for more efficient computation of the global basis, in particular, the relative one admits, once added to factor bases, an inexpensive reduction. A method implementing the approach as well as a set of further combinatorial optimizations is shown to outperform NextClosure on at least one dataset.

1 Introduction Since the early eighties, formal concept analysis (FCA) [10] has developed as a series of technics for representing and structuring qualitative data. Albeit rooted in lattice theory [3], it has produced a strong impact in data analysis and later on in data mining, generating a large body of work on closed-itemset representations [16, 25, 17] and association rule bases [19, 24], within the association rule mining (ARM) discipline [1]. This impact is a direct consequence of the double concern within FCA with both the concept systems (formalized as lattices) and the implication families stemming out of a tableau-shaped binary relation (formal context). The duality between lattices (equivalently in the finite case semilattices/closure operators, etc.) and implications (also called rules/dependencies) can be simply spelled: in extracting a semilattice out of a Boolean lattice, the more elements one removes, the more implications are generated and vice versa. Hence semilattices and implicational systems represent two sides of a same reality indicating, roughly speaking, what is existing/missing in the data. The relationship between lattices and implications has always been a key concern in FCA [23]. Noteworthily, this duality is known and used in the theory of relational databases [5] and in artificial intelligence [13, 12]. For instance, in database normalization theory, the computation of a minimal cover of a family of functional dependencies is a major task whose resolution makes extensive use of that duality [14, 15]. Usually, when dealing with combinatorial objects, a first concern is to put them in some “canonical” form. For a lattice such a form is the reduced standard context that only comprises irreducible intents from the semi-lattice. The implicational system can in turn be reduced to a basis of minimal size whereby the canonical basis [11] (called R. Medina and S. Obiedkov (Eds.): ICFCA 2008, LNAI 4933, pp. 182–198, 2008. c Springer-Verlag Berlin Heidelberg 2008 

On the Merge of Factor Canonical Bases

183

Duquenne-Guigues