High Utility Itemsets Mining Based on Divide-and-Conquer Strategy

PDF / 2,393,345 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
31 Downloads / 252 Views

High Utility Itemsets Mining Based on Divide‑and‑Conquer Strategy Jiyong Liao1 · Sheng Wu1 · Ailian Liu1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract High utility itemsets mining has become a hot research topic in association rules mining. But many algorithms directly mine datasets, and there is a problem on dense datasets, that is, too many itemsets stored in each transaction. In the process of mining association rules, it takes a lot of storage space and affects the running efficiency of the algorithm. In the existing algorithms, there is a lack of efficient itemset mining algorithms for dense datasets. Aiming at this problem, a high utility itemsets mining algorithm based on divideand-conquer strategy is proposed. Using the improved silhouette coefficient to select the best K-means cluster number, the datasets are divided into many smaller subclasses. Then, the association rules mining is performed by Boolean matrix compression operation on each subclass, and iteratively merge them to get the final mining results. We also analyze the time complexity of our method and Apriori algorithm. Finally, experimental results on several well-known real world datasets are conducted to show that the improved algorithm performs faster and consumes less memory on dense datasets, which can effectively improve the computational efficiency of the algorithm. Keywords Association rule mining · Divide-and-conquer strategy · K-means clustering · Dense datasets · Boolean matrix compression

1 Introduction Association rule mining is one of the most fundamental research direction in data mining, the task is to extract useful information from massive data to help people make decisions quickly. An enormous variety of association rule mining algorithms have been proposed over the past few decades, like FP-Growth [1], UP-Growth [2], UP-Growth+ [3]. One of * Jiyong Liao [email protected] Sheng Wu [email protected] Ailian Liu [email protected] 1

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China

13

Vol.:(0123456789)

J. Liao et al.

the most famous one is Apriori algorithm [4], its core idea is to generate frequent itemsets by scanning transaction databases, then generate association rules based on frequent itemsets. Due to Apriori algorithm has the disadvantage of repeatedly scanning database, association rules mining algorithm based on matrix compression has attracted a lot of attention in the community of data scientists and many impressive results have been reported [5–9]. In the case of large amount of data, the processing of Apriori algorithm is obviously slow, so the improved Apriori algorithm based on clustering are proposed [10, 11]. In recent years, with the increasing complexity of transactions, the high-utility itemset mining (HUIM) [12] has become an emerging topic since it can reveal the highly profitable products. Wang et al. [13] presented a tree-based temporal association rule mining algorithm, the numerical

Data Loading...

High Utility Itemsets Mining Based on Divide-and-Conquer Strategy

Recommend Documents

Mining Cross-Level High Utility Itemsets

Mining High Utility Itemsets Based on Transaction Deletion

Efficient Method for Mining High-Utility Itemsets Using High-Average Utility Measure

A SAT-Based Approach for Mining High Utility Itemsets from Transaction Databases

TKU-CE: Cross-Entropy Method for Mining Top-K High Utility Itemsets

High Average Utility Itemset Mining: A Survey

An Adaptive Utility Quantification Strategy for Penetration Semantic Knowledge Mining

An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matri

High-Utility Interval-Based Sequences

Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

Efficient Mining of Pareto-Front High Expected Utility Patterns

High-Utility Pattern Mining Theory, Algorithms and Applications