Probabilistic Mining in Large Transaction Databases

In this era of big data analysis, mining results hold a very important role. So, the data scientists need to be accurate enough with the tools, methods and procedures while performing rule mining. The major issues faced by these scientists are incremental

PDF / 397,842 Bytes
9 Pages / 439.37 x 666.142 pts Page_size
69 Downloads / 228 Views

DOWNLOAD

REPORT

Department of Computer Science and Engineering, Muthoot Institute of Technology and Science, Kochi, Kerala, India [email protected] 2 Computer Center, University of Kerala, Trivandrum, Kerala, India

Abstract. In this era of big data analysis, mining results hold a very important role. So, the data scientists need to be accurate enough with the tools, methods and procedures while performing rule mining. The major issues faced by these scientists are incremental mining and the huge amount of time that is virtually required to ﬁnish the mining task. In this context, we propose a new rule mining algorithm which mines the database in a probabilistic approach for ﬁnding interesting relations. This paper also compares the new technique with the traditional Apriori, FP Growth and Eclat algorithms. The proposal has also been tested against the various modiﬁed approaches of these algorithms. The proposed algorithm ﬁnishes the task in O (n) in its best case analysis and in O (n log n) in its worst case analysis. The algorithm also considers less frequent high priority attributes for rule creation, thus makes sure the creation of valid mining rules. The major issue of traditional algorithms was the generation of invalid rules, longer running time and high memory utilizations. This could be remedied by this new proposal. The algorithm was tested against various datasets and the results were evaluated and compared with the traditional algorithm. The results showed a peak performance improvement. Keywords: Probabilistic mining Association rules algorithm Priority mining Correlation mining

Rule mining Apriori

1 Introduction Association rules are if-then statements, which help to uncover the vast relationship between seemingly unrelated data [1]. It uses a combination of statistical analysis, machine learning and database management to exhaustively explore the data to reveal the complex relationships that exists. This work aims to provide all such details for analysis through valid rule creation using a probabilistic mining approach. The process begins by scanning the transactional database. Along with the scanning the probabilities of each itemset is calculated and stored in a probabilistic array. Rules are created from this array during mining. There are some itemsets which may be less frequent but having a high impact in the database. All traditional algorithms prunes off such items in a very early stage. But the proposed algorithm provides an extra threshold for less frequent high priority itemsets thereby making them available for rule creation. The proposed algorithm is compared with the traditional mining algorithm like Apriori, FP growth, © Springer International Publishing Switzerland 2016 Y. Tan and Y. Shi (Eds.): DMBD 2016, LNCS 9714, pp. 486–494, 2016. DOI: 10.1007/978-3-319-40973-3_49

Probabilistic Mining in Large Transaction Databases

487

and variations of these traditional algorithms. The performance of the algorithm is evaluated asymptotically and the result obtained shows a peak improvem

Data Loading...

Probabilistic Mining in Large Transaction Databases

Recommend Documents

Mining Multiple Large Databases

Supporting Transaction Time Databases

Probabilistic Databases

Probabilistic Temporal Databases

A SAT-Based Approach for Mining High Utility Itemsets from Transaction Databases

Data Mining in Moving Objects Databases

Information Mining from Multimedia Databases

Inductive Databases and Constraint-Based Data Mining

Mining Sequential Patterns from Spatio-Temporal Databases

A Fast Algorithm for Mining Closed Inter-transaction Patterns

Mining High Utility Itemsets Based on Transaction Deletion

Efficient Method for Mining Maximal Inter-transaction Patterns