Semi-fuzzy Splitting in Online Divisive-Agglomerative Clustering

The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on th

PDF / 357,332 Bytes
12 Pages / 430 x 660 pts Page_size
92 Downloads / 209 Views

DOWNLOAD

REPORT

LIAAD - INESC Porto L.A. Rua de Ceuta, 118 - 6 andar, 4050-190 Porto, Portugal 2 Faculty of Sciences of the University of Porto 3 Faculty of Economics of the University of Porto [email protected], [email protected]

Abstract. The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-oﬀ between validity and performance. Experimental work supports the beneﬁts of our approach. Keywords: fuzzy clustering, streaming time series, hierarchical models.

1

Introduction

The task of clustering variables over data streams, or streaming time series, is not widely studied. Data streams usually consist of variables producing examples continuously over time. The basic idea behind it is to ﬁnd groups of variables that behave similarly through time, which is usually measured in terms of time series similarities. Clustering time series has been already studied in various ﬁelds of real world applications. Many of them, however, could beneﬁt from a data stream approach. For example: – in electrical supply systems, clustering demand proﬁles (ex: industrial or urban) decreases the computational cost of predicting each individual subnetwork load [5,6]; – in medical systems, clustering medical sensor data (such as ECG, EEG, etc.) is useful to determine correlation between signals [14]; – in ﬁnancial markets, clustering stock prices evolution helps preventing bankruptcy [10]; J. Neves, M. Santos, and J. Machado (Eds.): EPIA 2007, LNAI 4874, pp. 133–144, 2007. c Springer-Verlag Berlin Heidelberg 2007

134

P. Pereira Rodrigues and J. Gama

All of these problems address data coming from a stream at high rate. Hence, data stream approaches should be considered to solve them. In this work we address the problem of clustering streaming series assuming data is gathered by a centralized process while it is becoming available for online analysis, as it was already targeted by recent research. Wang and Wang introduced an eﬃcient method for monitoring composite correlations, i.e., conjunctions of highly correlated pairs of streams among multiple time series [15]. They use a simple mechanism to predict the correlation values of relevant stream pairs at the next time position, using an incremental computation of the correlation, and rank the stream pairs carefully so that the pairs that are likely to have low correlation values are evaluated ﬁrst. Beringer and H¨ ullermeier proposed an online v

Data Loading...

Semi-fuzzy Splitting in Online Divisive-Agglomerative Clustering

Recommend Documents

Multiple Consensuses Clustering by Iterative Merging/Splitting of Clustering Patterns

On the Online Unit Clustering Problem

A Distributed Framework for Online Stream Data Clustering

Online Multi-objective Subspace Clustering for Streaming Data

Dynamic character graph via online face clustering for movie analysis

Splitting Methods

Splitting the Soul

Lumping or Splitting?

Recent developments in solar water-splitting photocatalysis

Blistering and Splitting in Hydrogen-implanted Silicon

Assignment Methods in Clustering

Clustering