Effective approximation of parametrized closure systems over transactional data streams

PDF / 1,769,299 Bytes
31 Pages / 439.37 x 666.142 pts Page_size
6 Downloads / 306 Views

Effective approximation of parametrized closure systems over transactional data streams Daniel Trabold1,3

· Tamás Horváth1,2,3 · Stefan Wrobel1,2,3

Received: 17 June 2018 / Revised: 9 February 2019 / Accepted: 6 July 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Abstract Strongly closed itemsets, defined by a parameterized closure operator, are a generalization of ordinary closed itemsets. Depending on the strength of closedness, the family of strongly closed itemsets typically forms a tiny subfamily of ordinary closed itemsets that is stable against changes in the input. In this paper we consider the problem of mining strongly closed itemsets from transactional data streams. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. We motivate the problem setting considered by two practical applications. In particular, we first experimentally demonstrate that the above properties, i.e., compactness and stability, make strongly closed itemsets an excellent indicator of certain types of concept drifts in transactional data streams. As a second application we consider computer-aided product configuration, a real-world problem raised by an industrial project. For this problem, which is essentially exact concept identification, we propose a learning algorithm based on a certain type of subset queries formed by strongly closed itemsets and show on real-world datasets that it requires significantly less query evaluations than a naive algorithm based on membership queries. Keywords Data streams · Itemset mining · Closed itemsets · Concept drift detection · Applications

B

Daniel Trabold [email protected] Tamás Horváth [email protected] Stefan Wrobel [email protected]

1

Fraunhofer IAIS, Schloss Birlinghoven, 53754 Sankt Augustin, Germany

2

Department of Computer Science, University of Bonn, Bonn, Germany

3

Fraunhofer Center for Machine Learning, Sankt Augustin, Germany

123

Machine Learning

1 Introduction It is a well-known fact that closed frequent itemsets provide a compact representation of frequent itemsets (Boros et al. 2003; Pasquier et al. 1999). The concept of closedness has been generalized in Boley et al. (2009): An itemset is strongly or more precisely, Δ-closed for some integer Δ > 0, if all of its extensions result in a drop of at least Δ transactions in its support set. Clearly, Δ-closed itemsets are ordinary closed (i.e., 1-closed) for any Δ ≥ 1. With increasing Δ, the number of Δ-closed itemsets becomes usually much smaller than that of ordinary closed itemsets (Boley et al. 2009) (i.e., the number of patterns can be controlled by Δ). The most common way to control the size of the output

Data Loading...

Effective approximation of parametrized closure systems over transactional data streams

Recommend Documents

Aggregate Computation over Data Streams

A Test Paradigm for Detecting Changes in Transactional Data Streams

Accelerating ELM training over data streams

Publish/Subscribe over Streams

Fixed Parameter Tractability of Graph Deletion Problems over Data Streams

Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams

Temporal-Logic Query Checking over Finite Data Streams

Load Shedding for Window Queries Over Continuous Data Streams

Data Streams

Data Approximation with Time-Frequency Invariant Systems

Spatio-Temporal Data Streams

Distributed Data Streams