Semi-fuzzy Splitting in Online Divisive-Agglomerative Clustering
The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on th
- PDF / 357,332 Bytes
- 12 Pages / 430 x 660 pts Page_size
- 92 Downloads / 181 Views
LIAAD - INESC Porto L.A. Rua de Ceuta, 118 - 6 andar, 4050-190 Porto, Portugal 2 Faculty of Sciences of the University of Porto 3 Faculty of Economics of the University of Porto [email protected], [email protected]
Abstract. The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-off between validity and performance. Experimental work supports the benefits of our approach. Keywords: fuzzy clustering, streaming time series, hierarchical models.
1
Introduction
The task of clustering variables over data streams, or streaming time series, is not widely studied. Data streams usually consist of variables producing examples continuously over time. The basic idea behind it is to find groups of variables that behave similarly through time, which is usually measured in terms of time series similarities. Clustering time series has been already studied in various fields of real world applications. Many of them, however, could benefit from a data stream approach. For example: – in electrical supply systems, clustering demand profiles (ex: industrial or urban) decreases the computational cost of predicting each individual subnetwork load [5,6]; – in medical systems, clustering medical sensor data (such as ECG, EEG, etc.) is useful to determine correlation between signals [14]; – in financial markets, clustering stock prices evolution helps preventing bankruptcy [10]; J. Neves, M. Santos, and J. Machado (Eds.): EPIA 2007, LNAI 4874, pp. 133–144, 2007. c Springer-Verlag Berlin Heidelberg 2007
134
P. Pereira Rodrigues and J. Gama
All of these problems address data coming from a stream at high rate. Hence, data stream approaches should be considered to solve them. In this work we address the problem of clustering streaming series assuming data is gathered by a centralized process while it is becoming available for online analysis, as it was already targeted by recent research. Wang and Wang introduced an efficient method for monitoring composite correlations, i.e., conjunctions of highly correlated pairs of streams among multiple time series [15]. They use a simple mechanism to predict the correlation values of relevant stream pairs at the next time position, using an incremental computation of the correlation, and rank the stream pairs carefully so that the pairs that are likely to have low correlation values are evaluated first. Beringer and H¨ ullermeier proposed an online v
Data Loading...