Alternative Method for Incrementally Constructing the FP-Tree

The FP-tree is an effective data structure that facilitates the mining of frequent patterns from transactional databases. But, transactional databases are dynamic in general, and hence modifications on the database must be reflecting onto the FP-tree. Con

  • PDF / 494,172 Bytes
  • 17 Pages / 439.18 x 665.994 pts Page_size
  • 32 Downloads / 209 Views

DOWNLOAD

REPORT


Department of Computer Science, University of Calgary, Calgary, AB, Canada Department of Computer Science, Global University, Beirut, Lebanon

Summary. The FP-tree is an effective data structure that facilitates the mining of frequent patterns from transactional databases. But, transactional databases are dynamic in general, and hence modifications on the database must be reflecting onto the FP-tree. Constructing the FP-tree from scratch and incrementally updating the FP-tree are two possible choices. However, from scratch construction turns unfeasible as the database size increases. So, this chapter addresses incremental update by extending the FP-tree concepts and manipulation process. Our new approach is capable of handling all kinds of changes, include additions, deletions and modifications. The target is achieved by constructing and incrementally dealing with the complete FP-tree, i.e., with one as the minimum support threshold. Constructing the complete FP-tree has the other advantage that it provides the freedom of mining for lower minimum support values without the need to reconstruct the tree. However, directly reflecting the changes onto the FP-tree may invalidate the basic FP-tree structure. Thus, we apply a sequence of shuffling and merging operations to validate and maintain the modified tree. The experiments conducted on synthetic and real datasets clearly highlight advantages of the proposed incremental approach over constructing the FP-tree from scratch.

1 Introduction Data mining is the process of discovering and predicting hidden and unknown knowledge by analyzing known databases. It is different from querying in the sense that querying is a retrieval process, while mining is a discovery process. Data mining has received considerable attention over the past decades and a number of effective mining techniques already exist. They are well investigated and documented in the literature. However, existing and emerging applications of data mining motivated the development of new techniques and the extension of existing ones to adapt to the change. Data mining has several applications, including market analysis, pattern recognition, gene expression data analysis, spatial data analysis, among others. Muhaimenul et al.: Alternative Method for Incrementally Constructing the FP-Tree, Studies in Computational Intelligence (SCI) 109, 361–377 (2008) c Springer-Verlag Berlin Heidelberg 2008 www.springerlink.com 

362

Muhaimenul et al.

Association rules mining is a technique that investigates the correction between items within the transactions of a given database. Explicitly, given a database of transactions, such that each transaction contains a set of items, the association rules mining process determines correlations of the form X ⇒ Y , such that X and Y are disjoint sets of items from the investigated database. A correlation X ⇒ Y is characterized by support and confidence. The former refers to the percentage of transactions that contain all items in X ∪Y by considering all the transactions in the database; and th