Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

PDF / 1,700,174 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
65 Downloads / 212 Views

ORIGINAL RESEARCH

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases Samar Wazir1 • M. M. Sufyan Beg2 • Tanvir Ahmad1

Received: 27 June 2018 / Accepted: 17 April 2019 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2019

Abstract The mechanism of Frequent Itemset Mining can be performed by using sequential algorithms like Apriori on a standalone system, or it can be applied using parallel algorithms like Count Distribution on a distributed system. Due to communication overhead in parallel algorithms and exponential candidate generation, many algorithms were developed for calculating frequent items either over the certain or uncertain database. Yet not a single algorithm is developed so far which can cover the requirement of generating frequent itemset by combining both the databases. We had proposed earlier MasterApriori algorithm which is used to calculate Approximate Frequent Items for a combination of certain and uncertain databases with the support of Apriori for Certain and Expected support based UApriori for the uncertain database. In this paper, the researcher would like to extend the former work by using Poisson and Normal Distribution based UApriori for the uncertain database. In proposed algorithms, there is only one-time communication between sites where data is distributed, which reduce the communication overhead. Scalability and efficiency of proposed algorithms are then checked by using standard, and synthetic databases. The performances were then measured by comparing time taken and a number of frequent items generated by each algorithm.

& Samar Wazir [email protected] 1

Department of Computer Engineering, Jamia Millia Islamia, New Delhi 110025, India

2

Department of Computer Engineering, Aligarh Muslim University, Aligarh 202001, India

Keywords Frequent Itemset Mining Certain and Uncertain Transactional Database Expected Support Poisson Distribution Normal Distribution Approximate Frequent Items

1 Introduction The process of collecting recurrently used products via consumer database pertains to an evolving terrain of Data Mining, known as Frequent Itemset Mining (FIM) [1–6]. The transactional database obtained in FIM credibles to be an exact or precise like record of items purchased by consumer, called as Certain Transactional Database (CTDB) illustrated in Table 1, and it can also be probabilistic or imprecise like record of items consumer will purchase in future, called as Uncertain Transactional Database (UTDB) [7, 8] illustrated in Table 2. In the given CTDB we can interpret each transaction, for instance, in transaction 0 customer buys item1, item2, item3, item4 and item5, furthermore in UTDB each item is present with its existential probability [7, 9], so we can say in transaction 0, the existential probabilities or chances of purchasing item1, item2, item3 and item4 are 0.7, 0.2, 0.8, 0.3 respectively. In the case of CTDB, we can say that the existential probability of buying every item is 1.0 or 100

Data Loading...

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

Recommend Documents

Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Approximation of Frequent Itemsets

Frequent Itemsets and Association Rules

Mining frequent itemsets using the N-list and subsume concepts

Top-k Frequent Itemsets Publication of Uncertain Data Based on Differential Privacy

Uncertain Databases

A SAT-Based Approach for Mining High Utility Itemsets from Transaction Databases

Minimizing Frequent Itemsets Using Hybrid ABCBAT Algorithm

Temporally Uncertain Databases

Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

Mining Cross-Level High Utility Itemsets

Oracle and Vertica for Frequent Itemset Mining