An autoencoder-based deep learning approach for clustering time series data

PDF / 6,782,074 Bytes
25 Pages / 595.276 x 790.866 pts Page_size
68 Downloads / 342 Views

An autoencoder‑based deep learning approach for clustering time series data Neda Tavakoli1 · Sima Siami‑Namini2 · Mahdi Adl Khanghah3 · Fahimeh Mirza Soltani3 · Akbar Siami Namin4 Received: 4 November 2019 / Accepted: 23 March 2020 © Springer Nature Switzerland AG 2020

Abstract This paper introduces a two-stage deep learning-based methodology for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of the given time series data in order to create labels and thus enable transformation of the problem from an unsupervised into a supervised learning. Second, an autoencoderbased deep learning model is built to model both known and hidden non-linear features of time series data. The paper reports a case study in which the selected financial and stock time series data of over 70 stock indices are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed methodology is capable of achieving 87.5% accuracy in clustering and predicting the labels for unseen time series data. The paper also reports an important finding in which it is observed that the performance of both techniques (i.e., autoencoder and Kmeans) are comparable. However, there are a few instances of time series data that are classified differently by the autoencoder-based methodology compared to the Kmeans algorithm. The results may indicate that the proposed deep learning-based approach is taking into account additional hidden features that might be overlooked by conventional Kmeans. The finding raises the question whether the explicit features of data should be analyzed for clustering or more advanced techniques such as deep learning need to be adapted by which hidden features and relationships are explored for clustering purposes. Keywords Kmeans clustering · Financial data analysis · Time series clustering · Deep learning · Encoder–decoder · Unsupervised learning · Supervised learning · Encoder–decoder · Multi-layer perceptron

1 Introduction An important step prior to performing any detailed data analysis is to understand the characteristics of a given data set. There are several statistical techniques that can help in creating different level of abstractions, each representing the data set from different angles. A very basic and prevalent technique is descriptive statistics such as mean and standard deviation, which are often utilized by data analysts in order to grasp the trend and variation of

observations and thus capture a big picture of the data. These types of metadata can describe and, more specifically, “featurize” data. Hence, in practice and theory, data analysis refers to the identification, selection, and analysis of features of data sets. A popular approach to conducting data analysis is through the conventional clustering problem, in which the given dataset is divided into subgroups. The goal is to maximize the similarity of the data observations grouped together; while maximize the dissimilarity of

* Akbar Siami Namin, akb

Data Loading...

An autoencoder-based deep learning approach for clustering time series data

Recommend Documents

Time Series Clustering with Deep Reservoir Computing

Forecasting Sensor Data Using Multivariate Time Series Deep Learning

A Survey on Deep Learning for Time-Series Forecasting

A deep learning approach for forecasting non-stationary big remote sensing time series

Unsupervised Visual Time-Series Representation Learning and Clustering

Contrastive Explanations for a Deep Learning Model on Time-Series Data

An Analysis of Deep Neural Networks for Predicting Trends in Time Series Data

Time Series Data Mining

Time Series Data Augmentation and Dropout Roles in Deep Learning Applied to Fall Detection

Incorporating Unsupervised Deep Learning into Meta Learning for Energy Time Series Forecasting

Regression Extension Techniques for Time-Series Data

Threshold Functional Dependencies for Time Series Data