Computing Theoretically-Sound Upper Bounds to Expected Support for Frequent Pattern Mining Problems over Uncertain Big D
Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, wher
- PDF / 815,670 Bytes
- 14 Pages / 439.37 x 666.142 pts Page_size
- 81 Downloads / 158 Views
2
DIA Department, University of Trieste and ICAR-CNR, Trieste, TS, Italy [email protected] Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada [email protected]
Abstract. Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, where each item in a transaction is usually associated with an existential probability expressing the likelihood of its presence in that transaction, the UF-growth algorithm captures important information about uncertain data in a UF-tree structure so that expected support can be computed for each pattern. A pattern is considered frequent if its expected support meets or exceeds the user-specified threshold. However, a challenge is that the UF-tree can be large. To handle this challenge, several algorithms use smaller trees such that upper bounds to expected support can be computed. In this paper, we examine these upper bounds, and determine which ones provide tighter upper bounds to expected support for frequent pattern mining of uncertain big data. Keywords: Uncertainty · Data analysis · Big data · Data science · Data mining
1
Introduction
Uncertain big data (e.g., [21,33,34,38]) are becoming more and more popular in modern applications [23] (e.g., social computing [20,22], data warehousing and OLAP [10]) because (big) data in real-life scenarios are typically imprecise and uncertain (e.g., [14,17,19]). Mining uncertain big data (e.g., [6,40]) is problematic due to the fact that models, techniques, and algorithms running on such data must consider uncertainty as a fundamental characteristic of big data while this challenging property is not foreseen by classical large-scale data mining approaches. As a consequence, mining uncertain big data is a first-class problem to deal with, and several interesting initiatives that focus the attention on this problem are appearing recently in active literature [12,35,45]. Among the wide class of data mining tasks [4,16,42,43], frequent pattern mining [2] is a very popular problem that has attracted the attention of a large c Springer International Publishing Switzerland 2016 J.P. Carvalho et al. (Eds.): IPMU 2016, Part II, CCIS 611, pp. 379–392, 2016. DOI: 10.1007/978-3-319-40581-0 31
380
A. Cuzzocrea and C.K. Leung
community of data miners. Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects (i.e., frequent patterns). It also serves as building blocks for various other data mining tasks such as stream mining [8,9,25,26] (which mines data that come at a high velocity), constrained mining [13], and social network mining [27,41]. Many existing algorithms mine frequent patterns from high volumes of precise data, in which users definitely know whether an item is present in, or absent from, a t
Data Loading...