Introducing time series snippets: a new primitive for summarizing long time series

  • PDF / 3,765,326 Bytes
  • 31 Pages / 439.37 x 666.142 pts Page_size
  • 22 Downloads / 193 Views

DOWNLOAD

REPORT


Introducing time series snippets: a new primitive for summarizing long time series Shima Imani1   · Frank Madrid1 · Wei Ding2 · Scott E. Crouter3 · Eamonn Keogh1 Received: 3 June 2019 / Accepted: 20 June 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract The first question a data analyst asks when confronting a new dataset is often, “Show me some representative/typical data.” Answering this question is simple in many domains, with random samples or aggregate statistics of some kind. Surprisingly, it is difficult for large time series datasets. The major difficulty is not time or space complexity, but defining what it means to be representative data for this data type. In this work, we show that the obvious candidate definitions: motifs, shapelets, cluster centers, random samples etc., are all poor choices. We introduce time series snippets, a novel representation of typical time series subsequences. Informally, time series snippets can be seen as the answer to the following question. If a user, which could be a human or a higher-level algorithm, only has resources (including human time) to inspect k subsequences of a long time series, which k subsequences should be chosen? Beyond their utility for visualizing and summarizing massive time series collections, we show that time series snippets have utility for high-level comparison of large time series collections.

Responsible editor: Panagiotis Papapetrou. * Shima Imani [email protected] Frank Madrid [email protected] Wei Ding [email protected] Scott E. Crouter [email protected] Eamonn Keogh [email protected] 1

University of California, Riverside, Riverside, USA

2

Department of Computer Science, University of Massachusetts Boston, Boston, USA

3

Department of Kinesiology, Recreation, and Sport Studies, The University of Tennessee Knoxville, Knoxville, USA



13

Vol.:(0123456789)



S. Imani et al.

Keywords  Time series · Summarization · Motifs · Sampling · Diversification

1 Introduction In many domains, a common analytical query is, “Show me some representative/ typical data.” This query might be issued by a human, attempting to explore a massive archive, or it might be issued by an algorithm as a subroutine in some higherlevel analytics. There are definitions and algorithms to answer this question for a plethora of datatypes, including images (Wang et al. 2012), sets (Pan et al. 2005), words (Salmenkivi 2006), graphs (Langohr and Toivonen 2012), videos (Elhamifar et al. 2012), tweets (Rosa et al. 2011), etc. Surprisingly, to the best of our knowledge, the problems of finding representative time series subsequences have not been solved despite the ubiquity of time series in almost all human endeavors. Moreover, as we will show, the obvious candidates for this task: motifs, shapelets, cluster centers, and random samples, will not generally produce meaningful results. We propose an algorithm to discover such representative patterns, which we will call time series snippets, or just snippets,