An improved algorithm for segmenting online time series with error bound guarantee

  • PDF / 1,277,842 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 103 Downloads / 180 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

An improved algorithm for segmenting online time series with error bound guarantee Huan-yu Zhao • Guang-xia Li • Hao-lan Zhang Yun Xue



Received: 7 June 2014 / Accepted: 27 October 2014 Ó Springer-Verlag Berlin Heidelberg 2014

Abstract In many real application, the volume of time series data increases seriously. How to store and process data becomes more interesting and challenge things. Effective representations can make storage less, processing more easily. In this paper, we contribute to construct a new piecewise linear approximation algorithm for segmenting online time series with error bound guarantee. To beat our targets, we combine a disconnected segment strategy into Feasible Space Window method, and to test our algorithm, we compare with algorithms that adopts the above strategies on both real and synthetic data sets. The time complexity of our algorithm is O(n) and the number of segments is smaller than FSW algorithm on all tested data sets.

F pk pek pi pi pstart pnext pt e 0 pt e u l ns pt e pt e

A set of linear line The k-th data point in time series The k-th intersection by represent line Data point with deleted tolerant error Data point with added tolerant error The starting point The next coming data point The data point at which the FSW becomes empty The joint point at te An upper boundary line A lower boundary line The number of segments The intersection point at te by l The intersection point at te by u

Keywords Error bound  Piecewise linear approximation  Time series  Data mining List of symbols d A given error bound on each data point P A time series

H. Zhao Institute of Applied Mathematics, Hebei Academy of Sciences, Shijiazhuang 050081, China G. Li Department of Information Engineering, Shijiazhuang University of Economics, Shijiazhuang 050031, China H. Zhang (&) NIT, Zhejiang University, Ningbo 315100, China e-mail: [email protected] Y. Xue Guangzhou Higher Education Mega Centre, South China Normal University, Guangzhou 500006, China

1 Introduction A time series is a sequence of data points with a time stamp. Time series exist in many application domains such as medicine [1], economic [2, 11] and weather [3]. The volume of time series stream data grows rapidly in some fields [12]. This leads that we can not store the whole time series and may loss some important information. So the huge data not only poses a challenge to store and represent, but also makes the processing methodologies [4, 17] more complicated. In addition, some time series may be generated continually. This need to analyze data in almost real time. Therefore, for time series streams, the algorithm must have the ability to deal with dynamic data. Piecewise linear approximation (PLA) is a widely used method because of its simplicity. PLA uses line segments to represent a time series data stream. This strategy is a

123

Int. J. Mach. Learn. & Cyber.

good preprocessing of mass data set and can make the processing methods relatively easy. There are many algorithms in PLA. According the cu