Approximate Clustering of Time Series Using Compact Model-Based Descriptions

Clustering time series is usually limited by the fact that the length of the time series has a significantly negative influence on the runtime. On the other hand, approximative clustering applied to existing compressed representations of time series (e.g.

PDF / 687,635 Bytes
16 Pages / 430 x 660 pts Page_size
67 Downloads / 327 Views

DOWNLOAD

REPORT

Abstract. Clustering time series is usually limited by the fact that the length of the time series has a signiﬁcantly negative inﬂuence on the runtime. On the other hand, approximative clustering applied to existing compressed representations of time series (e.g. obtained through dimensionality reduction) usually suﬀers from low accuracy. We propose a method for the compression of time series based on mathematical models that explore dependencies between diﬀerent time series. In particular, each time series is represented by a combination of a set of speciﬁc reference time series. The cost of this representation depend only on the number of reference time series rather than on the length of the time series. We show that using only a small number of reference time series yields a rather accurate representation while reducing the storage cost and runtime of clustering algorithms signiﬁcantly. Our experiments illustrate that these representations can be used to produce an approximate clustering with high accuracy and considerably reduced runtime.

1

Introduction

Clustering time series data is a very important data mining task for a wide variety of application ﬁelds including stock marketing, astronomy, environmental analysis, molecular biology, and medical analysis. In such application areas the time series have usually an enormous length which has a signiﬁcantly negative inﬂuence on the runtime of the clustering process. As a consequence, a lot of research work has focused on eﬃcient methods for similarity search in and clustering of time series in the past years. Time series are sequences of discrete quantitative data assigned to speciﬁc moments in time, i.e. a time series X is a sequence of values X = x1 , . . . , xN , where xi is the value at time slot i. This sequence is often also taken as a N dimensional feature vector, i.e. X ∈ N . The performance of clustering algorithms for time series data is mainly limited by the cost required to compare pairs of time series (i.e. the processing cost of the used distance function). As indicated above, time series are usually very large containing several thousands of values per sequence. Consequently, the

Ê

J.R. Haritsa, R. Kotagiri, and V. Pudi (Eds.): DASFAA 2008, LNCS 4947, pp. 364–379, 2008. c Springer-Verlag Berlin Heidelberg 2008

Approximate Clustering of Time Series

365

comparison of two time series can be very expensive, particularly when considering the entire sequence of values of the compared objects. The most prominent approaches to measure the similarity of time series are the Euclidean distance and Dynamic Time Warping (DTW). The choice of the distance function mainly depends on the application. In some applications, the Euclidean distance produce better results whereas in other applications, DTW is superior. The big limitation of DTW is its high computational cost of O(N 2 ) while the Euclidean distance between two time series can be computed in O(N ). Since we consider large databases and long time series (i.e. large values of N ) in this paper,

Data Loading...

Approximate Clustering of Time Series Using Compact Model-Based Descriptions

Recommend Documents

Assessing Satellite Image Time Series Clustering Using Growing SOM

A hybrid shape-based image clustering using time-series analysis

Time Series Clustering with Deep Reservoir Computing

A Comparison of Multivariate Time Series Clustering Methods

Partitioning-Clustering Techniques Applied to the Electricity Price Time Series

Clustering of Time-Series Balance History Data Streams Using Apache Spark

Metaheuristics on time series clustering problem: theoretical and empirical evaluation

Unsupervised Visual Time-Series Representation Learning and Clustering

Time Series Clustering for Knowledge Discovery on Metal Additive Manufacturing

Air Quality Prediction Using Time Series Analysis

Time Series

Time Series