A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

  • PDF / 2,336,320 Bytes
  • 31 Pages / 439.37 x 666.142 pts Page_size
  • 115 Downloads / 209 Views

DOWNLOAD

REPORT


A fast high average‑utility itemset mining with efficient tighter upper bounds and novel list structure Krishan Kumar Sethi1   · Dharavath Ramesh1 

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High averageutility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the averageutility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability. Keywords  High average-utility itemset mining · Pruning methods · Upper bound · Utility list structure · Data mining

* Dharavath Ramesh [email protected] Krishan Kumar Sethi [email protected] 1



Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, Jharkhand 826004, India

13

Vol.:(0123456789)



K. K. Sethi, D. Ramesh

1 Introduction Pattern mining [1–4] is a primary research subject in the field of data mining. A wide range of applications uses pattern mining techniques for data analysis and decision making, e.g., market basket analysis, text mining, etc. Pattern mining includes the discovery of interesting, useful, and valuable data objects to deduce crucial and functional information. Patterns can be categorized into various types, e.g., frequent itemsets, association rules, periodic patterns, sequential patterns, etc. Frequent Itemset Mining (FIM) [5–7] is a fundamental technique for searching for frequently appeared itemsets in a transaction dataset. An itemset is defined as frequent when its frequency (also known as ‘support’) is not lesser than a user-defined support threshold. Apriori [6] is the first FIM algorithm that generates frequent itemsets by a levelwise searching in an iterative manner. A downward closure (DC) property called ‘Apriori property’ is applied for pruning the weak candida