Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. Recently, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset

  • PDF / 531,954 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 16 Downloads / 221 Views

DOWNLOAD

REPORT


School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China [email protected], [email protected] 2 School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China [email protected] 3 Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan [email protected] 4 Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

Abstract. Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. Recently, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance, and the UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, a two-phase Aprioribased approach called HEWI-Uapriori was proposed to consider both item weight and uncertainty to mine the high expected weighted itemsets (HEWIs), while it generates a large amount of candidates and is too time-consuming. In this paper, a more efficient algorithm named HEWIUtree is developed to efficiently mine HEWIs without performing multiple database scans and without generating enormous candidates. It relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is efficient than traditional methods of WFIM and UFIM, as well as the HEWI-Uapriori algorithm, in terms of runtime, memory usage, and scalability. Keywords: Frequent itemsets frequent itemsets · WP-table

1

·

Uncertain databases

·

Weighted

Introduction

Knowledge Discovery in Database (KDD) aims at finding the meaningful and useful information from the amounts of mass data; frequent itemset mining (FIM) and c Springer International Publishing Switzerland 2016  P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 236–250, 2016. DOI: 10.1007/978-3-319-41920-6 18

Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

237

association rule mining (ARM) [4,8] are the important and fundamental issues in KDD. However, the limitations on most FIM algorithms are stated below. The first limitation is that previous studies generally ignore the fact that data collected in real-life applications such as wireless sensor network or location based-service, in which the data may be inaccurate, imprecise or incomplete [1,2,5]. In recent years, discovering patterns in uncertain databases has become a critical issue. Numerous algorithms have been designed for the discovery of frequent patterns in uncertain databases [1,2,5,7,15], which can be generally classified into two categories as the expected support model [1,7] and the probabilisti