Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

  • PDF / 2,283,414 Bytes
  • 21 Pages / 595.276 x 790.866 pts Page_size
  • 27 Downloads / 174 Views

DOWNLOAD

REPORT


Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy Huong Bui 1,2 & Bay Vo 3

&

Tu-Anh Nguyen-Hoang 1,2 & Unil Yun 4

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The problem of mining frequent weighted itemsets (FWIs) is an extension of the mining frequent itemsets (FIs), which considers not only the frequent occurrence of items but also their relative importance in a dataset. However, like mining FIs, mining FWIs usually produces a large result set, which makes it difficult to extract rules and creates redundancy. The problem of mining frequent weighted closed itemsets (FWCIs) has been proposed as a solution to this issue, which produces a smaller result set while preserving sufficient information to extract rules. The weighted node-list (WN-list) structure is currently considered the state-ofthe-art structure for mining FWIs. In this study, we first propose the definition of WN-list ancestral operation and a theorem as the theoretical basis for eliminating unsatisfactory candidates, then propose an efficient algorithm, namely NFWCI, for mining FWCIs using the WN-list and an early pruning strategy. The experimental results on many sparse and dense datasets show that the proposed algorithm outperforms the-state-of-the-art algorithm for mining FWCIs. Keywords Data mining . Frequent weighted closed itemsets . Weighted support . WN-list structure

1 Introduction Data mining [1–6] focuses on finding anomalies, patterns, and correlations in large datasets to predict outcomes. Data mining is often used by retail and financial companies to analyze data and predict customer demand to increase revenues, cut costs, improve customer relationships, reduce risk and more. Mining frequent patterns is a fundamental research area in data

* Bay Vo [email protected] Huong Bui [email protected] Tu-Anh Nguyen-Hoang [email protected] Unil Yun [email protected] 1

University of Information Technology, Ho Chi Minh City, Vietnam

2

Vietnam National University, Ho Chi Minh City, Vietnam

3

Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam

4

Department of Computer Engineering, Sejong University, Seoul, Republic of Korea

mining, which focuses on finding patterns that occur frequently in a large dataset. The problem of mining frequent patterns includes subproblems such as mining frequent itemsets (FIs) [7, 8], mining frequent sequences [9, 10], and mining frequent subgraphs [11]. The mining FIs is the problem of finding the sets of items that appear together with a number of times, called support, greater than or equal a given threshold, called minimum support, in a transactional dataset. The set of FIs found is used to mine association rules [1, 4, 12] to analyze and predict trends and customer needs. However, mining FIs has two drawbacks when it comes to applications in real life. The first drawback is to consider all items as equally important, while in fact, items are often of di