Probabilistic Mining in Large Transaction Databases
In this era of big data analysis, mining results hold a very important role. So, the data scientists need to be accurate enough with the tools, methods and procedures while performing rule mining. The major issues faced by these scientists are incremental
- PDF / 397,842 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 69 Downloads / 207 Views
Department of Computer Science and Engineering, Muthoot Institute of Technology and Science, Kochi, Kerala, India [email protected] 2 Computer Center, University of Kerala, Trivandrum, Kerala, India
Abstract. In this era of big data analysis, mining results hold a very important role. So, the data scientists need to be accurate enough with the tools, methods and procedures while performing rule mining. The major issues faced by these scientists are incremental mining and the huge amount of time that is virtually required to finish the mining task. In this context, we propose a new rule mining algorithm which mines the database in a probabilistic approach for finding interesting relations. This paper also compares the new technique with the traditional Apriori, FP Growth and Eclat algorithms. The proposal has also been tested against the various modified approaches of these algorithms. The proposed algorithm finishes the task in O (n) in its best case analysis and in O (n log n) in its worst case analysis. The algorithm also considers less frequent high priority attributes for rule creation, thus makes sure the creation of valid mining rules. The major issue of traditional algorithms was the generation of invalid rules, longer running time and high memory utilizations. This could be remedied by this new proposal. The algorithm was tested against various datasets and the results were evaluated and compared with the traditional algorithm. The results showed a peak performance improvement. Keywords: Probabilistic mining Association rules algorithm Priority mining Correlation mining
Rule mining Apriori
1 Introduction Association rules are if-then statements, which help to uncover the vast relationship between seemingly unrelated data [1]. It uses a combination of statistical analysis, machine learning and database management to exhaustively explore the data to reveal the complex relationships that exists. This work aims to provide all such details for analysis through valid rule creation using a probabilistic mining approach. The process begins by scanning the transactional database. Along with the scanning the probabilities of each itemset is calculated and stored in a probabilistic array. Rules are created from this array during mining. There are some itemsets which may be less frequent but having a high impact in the database. All traditional algorithms prunes off such items in a very early stage. But the proposed algorithm provides an extra threshold for less frequent high priority itemsets thereby making them available for rule creation. The proposed algorithm is compared with the traditional mining algorithm like Apriori, FP growth, © Springer International Publishing Switzerland 2016 Y. Tan and Y. Shi (Eds.): DMBD 2016, LNCS 9714, pp. 486–494, 2016. DOI: 10.1007/978-3-319-40973-3_49
Probabilistic Mining in Large Transaction Databases
487
and variations of these traditional algorithms. The performance of the algorithm is evaluated asymptotically and the result obtained shows a peak improvem
Data Loading...