Mining frequent itemsets using the N-list and subsume concepts
- PDF / 1,803,464 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 24 Downloads / 208 Views
ORIGINAL ARTICLE
Mining frequent itemsets using the N-list and subsume concepts Bay Vo • Tuong Le • Frans Coenen Tzung-Pei Hong
•
Received: 27 December 2013 / Accepted: 2 April 2014 / Published online: 26 April 2014 Springer-Verlag Berlin Heidelberg 2014
Abstract Frequent itemset mining is a fundamental element with respect to many data mining problems directed at finding interesting patterns in data. Recently the PrePost algorithm, a new algorithm for mining frequent itemsets based on the idea of N-lists, which in most cases outperforms other current state-of-the-art algorithms, has been presented. This paper proposes an improved version of PrePost, the N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) algorithm that uses a hash table to enhance the process of creating the N-lists associated with 1-itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the ‘‘subsume index’’ of frequent 1-itemsets based on the N-list concept. Using the subsume index, NSFI can identify groups of frequent itemsets without determining the N-list associated with them. The experimental results show that NSFI outperforms PrePost in This paper is a expanded version of the paper ‘‘A hybrid approach for mining frequent itemsets’’ [32] presented in IEEE International Conference on Systems, Man, and Cybernetics 2013. B. Vo T. Le (&) Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam e-mail: [email protected]
terms of runtime and memory usage and outperforms dEclat in terms of runtime. Keywords Data mining Pattern mining Frequent itemset N-list Subsume
1 Introduction Itemset mining is an important problem within the field of data mining. Currently, there are many variations of itemset mining such as frequent itemset mining [1, 23], frequent closed itemset mining [25, 35], frequent weighted itemset mining [29], constrained itemset mining [5], erasable itemset mining [8, 16, 17] and so on. However, frequent itemset mining is the most popular. Frequent itemset mining plays an important role in association rule mining [2, 22, 30, 33], sequential mining [3, 4, 12, 14, 26], classification [6, 7, 18–21, 24, 28, 36, 37]. Currently, there are a large number of algorithms which effectively mine frequent itemsets. They may be divided into three main groups: 1.
B. Vo e-mail: [email protected] F. Coenen Department of Computer Science, University of Liverpool, Liverpool, UK e-mail: [email protected] T.-P. Hong Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan e-mail: [email protected]
2.
Methods that use a candidate generate-and-test strategy: These methods use a level-wise approach for mining frequent itemsets. First, they generate frequent 1-itemsets which are then used to generate candidate 2-itemsets, and so on until no more candidates can be generated. Apriori [2] and BitTableFI [11] are exemplar algorithms. Methods that adopt a divide-and-conquer
Data Loading...