Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases
- PDF / 1,700,174 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 65 Downloads / 210 Views
ORIGINAL RESEARCH
Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases Samar Wazir1 • M. M. Sufyan Beg2 • Tanvir Ahmad1
Received: 27 June 2018 / Accepted: 17 April 2019 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2019
Abstract The mechanism of Frequent Itemset Mining can be performed by using sequential algorithms like Apriori on a standalone system, or it can be applied using parallel algorithms like Count Distribution on a distributed system. Due to communication overhead in parallel algorithms and exponential candidate generation, many algorithms were developed for calculating frequent items either over the certain or uncertain database. Yet not a single algorithm is developed so far which can cover the requirement of generating frequent itemset by combining both the databases. We had proposed earlier MasterApriori algorithm which is used to calculate Approximate Frequent Items for a combination of certain and uncertain databases with the support of Apriori for Certain and Expected support based UApriori for the uncertain database. In this paper, the researcher would like to extend the former work by using Poisson and Normal Distribution based UApriori for the uncertain database. In proposed algorithms, there is only one-time communication between sites where data is distributed, which reduce the communication overhead. Scalability and efficiency of proposed algorithms are then checked by using standard, and synthetic databases. The performances were then measured by comparing time taken and a number of frequent items generated by each algorithm.
& Samar Wazir [email protected] 1
Department of Computer Engineering, Jamia Millia Islamia, New Delhi 110025, India
2
Department of Computer Engineering, Aligarh Muslim University, Aligarh 202001, India
Keywords Frequent Itemset Mining Certain and Uncertain Transactional Database Expected Support Poisson Distribution Normal Distribution Approximate Frequent Items
1 Introduction The process of collecting recurrently used products via consumer database pertains to an evolving terrain of Data Mining, known as Frequent Itemset Mining (FIM) [1–6]. The transactional database obtained in FIM credibles to be an exact or precise like record of items purchased by consumer, called as Certain Transactional Database (CTDB) illustrated in Table 1, and it can also be probabilistic or imprecise like record of items consumer will purchase in future, called as Uncertain Transactional Database (UTDB) [7, 8] illustrated in Table 2. In the given CTDB we can interpret each transaction, for instance, in transaction 0 customer buys item1, item2, item3, item4 and item5, furthermore in UTDB each item is present with its existential probability [7, 9], so we can say in transaction 0, the existential probabilities or chances of purchasing item1, item2, item3 and item4 are 0.7, 0.2, 0.8, 0.3 respectively. In the case of CTDB, we can say that the existential probability of buying every item is 1.0 or 100
Data Loading...