Robust fuzzy clustering based on quantile autocovariances

  • PDF / 2,324,320 Bytes
  • 56 Pages / 439.37 x 666.142 pts Page_size
  • 72 Downloads / 210 Views

DOWNLOAD

REPORT


Robust fuzzy clustering based on quantile autocovariances B. Lafuente-Rego1 · P. D’Urso2 · J. A. Vilar1 Received: 17 October 2017 / Revised: 26 September 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract Robustness to the presence of outliers in time series clustering is addressed. Assuming that the clustering principle is to group realizations of series generated from similar dependence structures, three robust versions of a fuzzy C-medoids model based on comparing sample quantile autocovariances are proposed by considering, respectively, the so-called metric, noise, and trimmed approaches. Each method achieves its robustness against outliers in different manner. The metric approach considers a suitable transformation of the distance aimed at smoothing the effect of the outliers, the noise approach brings together the outliers into a separated artificial cluster, and the trimmed approach removes a fraction of the time series. All the proposed approaches take advantage of the high capability of the quantile autocovariances to discriminate between independent realizations from a broad range of stationary processes, including linear, non-linear and conditional heteroskedastic models. An extensive simulation study involving scenarios with different generating models and contaminated with outliers is performed. Robustness against (i) outliers generated from different generating patterns, and (ii) outliers characterized by isolated, temporary or persistent level changes is evaluated. The influence of the input parameters required by the different algorithms is analyzed. Regardless of the considered models, the results show that the proposed robust procedures are able to neutralize the effect of the anomalous series preserving the true clustering structure, and fairly outperform other robust algorithms based on alternative metrics. Two applications to financial data sets permit to illustrate the usefulness of the proposed models. Keywords Time series data · Robust fuzzy C-medoids clustering · Quantile autocovariances · Exponential distance · Noise cluster · Trimming

B

B. Lafuente-Rego [email protected]

Extended author information available on the last page of the article

123

B. Lafuente-Rego et al.

1 Introduction Time series are in nature complex data objects. They are usually formed by a huge number of records, present dynamic behavior patterns which might change over time, and frequently one must handle realizations of different length. This kind of particularities make it more difficult to perform cluster analysis in a standard way. For example, it is not simple to determine a proper distance between time series exhibiting robustness to the dependence structure, and high dimensionality is an additional obstacle to develop efficient clustering procedures. On the other hand, time series clustering is a central problem in a broad range of applications including economics, finance, econophysics, marketing, environmental sciences, neuroscience, and biomedical sciences, among others (see