An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matri

  • PDF / 1,027,092 Bytes
  • 27 Pages / 439.37 x 666.142 pts Page_size
  • 68 Downloads / 179 Views

DOWNLOAD

REPORT


An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix Mohammad Karim Sohrabi1 Received: 12 June 2018 / Accepted: 30 June 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract High utility itemset mining is an important extension of frequent itemset mining which considers unit profits and quantities of items as external and internal utilities, respectively. Since the utility function has not downward closure property, an overestimated value of utility is obtained using an anti-monotonic upper bound of utility function to prune the search space and improve the efficiency of high utility itemset mining methods. Transaction-weighted utilization (TWU) of itemset was the first and one of the most important functions which has been used as the anti-monotonic upper bound of utility by various algorithms. A variety of high utility itemset mining methods have attempted to tighten the utility upper bound and have exploited appropriate pruning strategies to improve mining efficiency. Although TWU and its improved alternatives have attempted to increase the efficiency of high utility itemset mining methods by pruning their search spaces, they suffer from a significant number of generated candidates which are high-TWU but are not high utility itemsets. Calculating the actual utilities of low utility candidates needs to multiple scanning of the dataset and thus imposes a huge overhead to the mining methods, which can cause to lose the pruning benefits of the upper bounds. Proposing appropriate pruning strategies, exploiting efficient data structures, and using tight anti-monotonic upper bounds can overcome this problem and lead to significant performance improvement in high utility itemset mining methods. In this paper, a new projection-based method, called MAHI (matrix-aided high utility itemset mining), is introduced which uses a novel utility matrix-based pruning strategy, called MAprune to improve the high utility itemset mining performance in terms of execution time. The experimental results show that MAHI is faster than former algorithms. Keywords Frequent itemset mining · High utility itemset mining · Itemset · Transaction-weighted utility · Pruning strategy

1 Introduction Data mining is a key artificial intelligence technique that has been used in different applications. Among several data mining approaches and strategies, association rule mining is

B 1

Mohammad Karim Sohrabi [email protected] Department of Computer Engineering, Semnan Branch, Islamic Azad University, Semnan, Iran

123

M. K. Sohrabi

one of the most common and important methods that has been exploited and studied by researchers. Frequent pattern mining plays an essential role in association rule mining process as its major phase. Among various types of patterns, such as sequential patterns [1] and subgraphs [2], which can be extracted from their corresponding datasets using frequent pattern mining techniques, itemsets are very important and have attracted a