An autoencoder-based deep learning approach for clustering time series data

  • PDF / 6,782,074 Bytes
  • 25 Pages / 595.276 x 790.866 pts Page_size
  • 68 Downloads / 181 Views

DOWNLOAD

REPORT


An autoencoder‑based deep learning approach for clustering time series data Neda Tavakoli1 · Sima Siami‑Namini2 · Mahdi Adl Khanghah3 · Fahimeh Mirza Soltani3 · Akbar Siami Namin4 Received: 4 November 2019 / Accepted: 23 March 2020 © Springer Nature Switzerland AG 2020

Abstract This paper introduces a two-stage deep learning-based methodology for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of the given time series data in order to create labels and thus enable transformation of the problem from an unsupervised into a supervised learning. Second, an autoencoderbased deep learning model is built to model both known and hidden non-linear features of time series data. The paper reports a case study in which the selected financial and stock time series data of over 70 stock indices are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed methodology is capable of achieving 87.5% accuracy in clustering and predicting the labels for unseen time series data. The paper also reports an important finding in which it is observed that the performance of both techniques (i.e., autoencoder and Kmeans) are comparable. However, there are a few instances of time series data that are classified differently by the autoencoder-based methodology compared to the Kmeans algorithm. The results may indicate that the proposed deep learning-based approach is taking into account additional hidden features that might be overlooked by conventional Kmeans. The finding raises the question whether the explicit features of data should be analyzed for clustering or more advanced techniques such as deep learning need to be adapted by which hidden features and relationships are explored for clustering purposes. Keywords  Kmeans clustering · Financial data analysis · Time series clustering · Deep learning · Encoder–decoder · Unsupervised learning · Supervised learning · Encoder–decoder · Multi-layer perceptron

1 Introduction An important step prior to performing any detailed data analysis is to understand the characteristics of a given data set. There are several statistical techniques that can help in creating different level of abstractions, each representing the data set from different angles. A very basic and prevalent technique is descriptive statistics such as mean and standard deviation, which are often utilized by data analysts in order to grasp the trend and variation of

observations and thus capture a big picture of the data. These types of metadata can describe and, more specifically, “featurize” data. Hence, in practice and theory, data analysis refers to the identification, selection, and analysis of features of data sets. A popular approach to conducting data analysis is through the conventional clustering problem, in which the given dataset is divided into subgroups. The goal is to maximize the similarity of the data observations grouped together; while maximize the dissimilarity of

*  Akbar Siami Namin, akb