Computing Theoretically-Sound Upper Bounds to Expected Support for Frequent Pattern Mining Problems over Uncertain Big D

Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, wher

PDF / 815,670 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
81 Downloads / 177 Views

DOWNLOAD

REPORT

2

DIA Department, University of Trieste and ICAR-CNR, Trieste, TS, Italy [email protected] Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada [email protected]

Abstract. Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, where each item in a transaction is usually associated with an existential probability expressing the likelihood of its presence in that transaction, the UF-growth algorithm captures important information about uncertain data in a UF-tree structure so that expected support can be computed for each pattern. A pattern is considered frequent if its expected support meets or exceeds the user-speciﬁed threshold. However, a challenge is that the UF-tree can be large. To handle this challenge, several algorithms use smaller trees such that upper bounds to expected support can be computed. In this paper, we examine these upper bounds, and determine which ones provide tighter upper bounds to expected support for frequent pattern mining of uncertain big data. Keywords: Uncertainty · Data analysis · Big data · Data science · Data mining

1

Introduction

Uncertain big data (e.g., [21,33,34,38]) are becoming more and more popular in modern applications [23] (e.g., social computing [20,22], data warehousing and OLAP [10]) because (big) data in real-life scenarios are typically imprecise and uncertain (e.g., [14,17,19]). Mining uncertain big data (e.g., [6,40]) is problematic due to the fact that models, techniques, and algorithms running on such data must consider uncertainty as a fundamental characteristic of big data while this challenging property is not foreseen by classical large-scale data mining approaches. As a consequence, mining uncertain big data is a ﬁrst-class problem to deal with, and several interesting initiatives that focus the attention on this problem are appearing recently in active literature [12,35,45]. Among the wide class of data mining tasks [4,16,42,43], frequent pattern mining [2] is a very popular problem that has attracted the attention of a large c Springer International Publishing Switzerland 2016 J.P. Carvalho et al. (Eds.): IPMU 2016, Part II, CCIS 611, pp. 379–392, 2016. DOI: 10.1007/978-3-319-40581-0 31

380

A. Cuzzocrea and C.K. Leung

community of data miners. Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects (i.e., frequent patterns). It also serves as building blocks for various other data mining tasks such as stream mining [8,9,25,26] (which mines data that come at a high velocity), constrained mining [13], and social network mining [27,41]. Many existing algorithms mine frequent patterns from high volumes of precise data, in which users deﬁnitely know whether an item is present in, or absent from, a t

Data Loading...

Computing Theoretically-Sound Upper Bounds to Expected Support for Frequent Pattern Mining Problems over Uncertain Big D

Recommend Documents

Frequent Graph Pattern Mining with Length-Decreasing Support Constraints

Frequent Pattern Mining with Constraints

Bounds for uncertain structural problems with large-range interval parameters

Upper bounds of nodal sets for eigenfunctions of eigenvalue problems

Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Frequent Pattern

Mining Maximal Frequent Subtrees with Lists-Based Pattern-Growth Method

Cognitive Computing for Big Data Systems Over IoT Frameworks, Tools

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

Frequent Itemset Mining

Tail Bounds for Occupancy Problems