Mining frequent itemsets using the N-list and subsume concepts

PDF / 1,803,464 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
24 Downloads / 219 Views

ORIGINAL ARTICLE

Mining frequent itemsets using the N-list and subsume concepts Bay Vo • Tuong Le • Frans Coenen Tzung-Pei Hong

•

Received: 27 December 2013 / Accepted: 2 April 2014 / Published online: 26 April 2014 Springer-Verlag Berlin Heidelberg 2014

Abstract Frequent itemset mining is a fundamental element with respect to many data mining problems directed at finding interesting patterns in data. Recently the PrePost algorithm, a new algorithm for mining frequent itemsets based on the idea of N-lists, which in most cases outperforms other current state-of-the-art algorithms, has been presented. This paper proposes an improved version of PrePost, the N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) algorithm that uses a hash table to enhance the process of creating the N-lists associated with 1-itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the ‘‘subsume index’’ of frequent 1-itemsets based on the N-list concept. Using the subsume index, NSFI can identify groups of frequent itemsets without determining the N-list associated with them. The experimental results show that NSFI outperforms PrePost in This paper is a expanded version of the paper ‘‘A hybrid approach for mining frequent itemsets’’ [32] presented in IEEE International Conference on Systems, Man, and Cybernetics 2013. B. Vo T. Le (&) Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam e-mail: [email protected]

terms of runtime and memory usage and outperforms dEclat in terms of runtime. Keywords Data mining Pattern mining Frequent itemset N-list Subsume

1 Introduction Itemset mining is an important problem within the field of data mining. Currently, there are many variations of itemset mining such as frequent itemset mining [1, 23], frequent closed itemset mining [25, 35], frequent weighted itemset mining [29], constrained itemset mining [5], erasable itemset mining [8, 16, 17] and so on. However, frequent itemset mining is the most popular. Frequent itemset mining plays an important role in association rule mining [2, 22, 30, 33], sequential mining [3, 4, 12, 14, 26], classification [6, 7, 18–21, 24, 28, 36, 37]. Currently, there are a large number of algorithms which effectively mine frequent itemsets. They may be divided into three main groups: 1.

B. Vo e-mail: [email protected] F. Coenen Department of Computer Science, University of Liverpool, Liverpool, UK e-mail: [email protected] T.-P. Hong Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan e-mail: [email protected]

2.

Methods that use a candidate generate-and-test strategy: These methods use a level-wise approach for mining frequent itemsets. First, they generate frequent 1-itemsets which are then used to generate candidate 2-itemsets, and so on until no more candidates can be generated. Apriori [2] and BitTableFI [11] are exemplar algorithms. Methods that adopt a divide-and-conquer

Data Loading...

Mining frequent itemsets using the N-list and subsume concepts

Recommend Documents

Approximation of Frequent Itemsets

Frequent Itemsets and Association Rules

Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Minimizing Frequent Itemsets Using Hybrid ABCBAT Algorithm

Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

Frequent Concepts

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

Mining Cross-Level High Utility Itemsets

Frequent Itemset Mining

Constrained Frequent Itemset Mining

Frequent Set Mining with Constraints

Frequent Itemset Mining with Constraints