Fast Top-K association rule mining using rule generation property pruning

  • PDF / 5,408,765 Bytes
  • 17 Pages / 595.224 x 790.955 pts Page_size
  • 14 Downloads / 235 Views

DOWNLOAD

REPORT


Fast Top-K association rule mining using rule generation property pruning Xiangyu Liu1 · Xinzheng Niu1 · Philippe Fournier-Viger2 Accepted: 1 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Traditional association rule mining algorithms can have a long runtime, high memory consumption, and generate a huge number of rules. Browsing through numerous rules and adjusting parameters to find just enough rules is a tedious task for users, who are often only interested in finding the strongest rules. Hence, many recent studies have focused on mining the top-k most frequent association rules that have a minimum confidence so as to limit the number of rules by ranking them by frequency. Though this redefined task has many applications, the performance of current algorithms remains an issue. To address this issue, this paper presents a novel algorithm named FTARM (Fast Top-K Association Rule Miner) to efficiently find the set of top-k association rules using a novel technique called Rule Generation Property Pruning (RGPP). This technique reduces the search space by analyzing the internal relationships between items of the database to be mined and the parameters set by users. Furthermore, a novel candidate pruning property is used by this technique to speed up the mining process. FTARM’s efficiency was evaluated on various public benchmark datasets. A substantial reduction of the association rule mining time and memory usage was observed, and that FTARM has good scalability, which can benefit to many applications. Keywords Data mining · Association rule mining · Top-k rules · Rule expansion

1 Introduction Data mining techniques [28, 29] are applied in numerous domains to discover insightful, novel, and interesting patterns, as well as to build understandable, descriptive, and predictive models from large volumes of data. It encompasses various algorithms such as principle component analysis, clustering, sequence detection, association rule mining, and classification [40]. Association rule mining (ARM) is

 Xinzheng Niu

[email protected] Xiangyu Liu ericliu [email protected] Philippe Fournier-Viger [email protected] 1

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

2

School of Humanities and Social Sciences, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China

an unsupervised data mining task that consists of extracting interesting associations and frequent patterns from sets of items in a transaction database or other data repositories [38]. ARM is often considered as an exploratory data mining technique, as it can be used to find associations between values that can help to better understand the data. Association rules have a wide range of applications in many fields. A well-known application of association rules is in the business field [19], where discovering associations between purchased products is very useful for decisionmaking and devising effective marketing strategies. In addi