Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. Recently, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset

PDF / 531,954 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
16 Downloads / 247 Views

DOWNLOAD

REPORT

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China [email protected], [email protected] 2 School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China [email protected] 3 Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan [email protected] 4 Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

Abstract. Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. Recently, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have diﬀerent weight/importance, and the UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, a two-phase Aprioribased approach called HEWI-Uapriori was proposed to consider both item weight and uncertainty to mine the high expected weighted itemsets (HEWIs), while it generates a large amount of candidates and is too time-consuming. In this paper, a more eﬃcient algorithm named HEWIUtree is developed to eﬃciently mine HEWIs without performing multiple database scans and without generating enormous candidates. It relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is eﬃcient than traditional methods of WFIM and UFIM, as well as the HEWI-Uapriori algorithm, in terms of runtime, memory usage, and scalability. Keywords: Frequent itemsets frequent itemsets · WP-table

1

·

Uncertain databases

·

Weighted

Introduction

Knowledge Discovery in Database (KDD) aims at ﬁnding the meaningful and useful information from the amounts of mass data; frequent itemset mining (FIM) and c Springer International Publishing Switzerland 2016 P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 236–250, 2016. DOI: 10.1007/978-3-319-41920-6 18

Eﬃcient Mining of Weighted Frequent Itemsets in Uncertain Databases

237

association rule mining (ARM) [4,8] are the important and fundamental issues in KDD. However, the limitations on most FIM algorithms are stated below. The ﬁrst limitation is that previous studies generally ignore the fact that data collected in real-life applications such as wireless sensor network or location based-service, in which the data may be inaccurate, imprecise or incomplete [1,2,5]. In recent years, discovering patterns in uncertain databases has become a critical issue. Numerous algorithms have been designed for the discovery of frequent patterns in uncertain databases [1,2,5,7,15], which can be generally classiﬁed into two categories as the expected support model [1,7] and the probabilisti

Data Loading...

Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Recommend Documents

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

Approximation of Frequent Itemsets

Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

Frequent Itemsets and Association Rules

Mining frequent itemsets using the N-list and subsume concepts

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams

Uncertain Databases

Top-k Frequent Itemsets Publication of Uncertain Data Based on Differential Privacy

Minimizing Frequent Itemsets Using Hybrid ABCBAT Algorithm

Temporally Uncertain Databases

Mining Cross-Level High Utility Itemsets

Frequent Itemset Mining