TurboLift: fast accuracy lifting for historical data recovery

  • PDF / 4,017,005 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 31 Downloads / 149 Views

DOWNLOAD

REPORT


REGULAR PAPER

TurboLift: fast accuracy lifting for historical data recovery Fan Yang1 · Faisal M. Almutairi2 · Hyun Ah Song3 · Christos Faloutsos3 · Nicholas D. Sidiropoulos4 · Vladimir Zadorozhny1 Received: 3 March 2019 / Revised: 3 February 2020 / Accepted: 24 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data analysis and machine learning models require reconstructing the historical events in a finer granularity, e.g., the weekly patient counts, for elaborate analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. Time series disaggregation methods commonly utilize domain knowledge about the data, e.g., smoothness, periodicity, or sparsity, to improve the reconstruction accuracy. In this paper, we propose a novel approach, called TurboLift, which aims to improve the quality of the solutions provided by existing disaggregation methods. Starting from a solution produced by a specific method, TurboLift finds a new solution that reduces the disaggregation error and is close to the initial one. We derive a closed-form solution to the proposed formulation of TurboLift that enables us to obtain an accurate reconstruction analytically, without performing resource and time-consuming iterations. Experiments on real data from different domains showcase the effectiveness of TurboLift in terms of disaggregation error, and outlier and anomaly detection. Keywords Historical data · Information fusion · Information disaggregation

1 Introduction There are numerous amounts of historical datasets collected and made available to the public by different groups world-

B

Fan Yang [email protected] Faisal M. Almutairi [email protected] Hyun Ah Song [email protected] Christos Faloutsos [email protected] Nicholas D. Sidiropoulos [email protected] Vladimir Zadorozhny [email protected]

1

University of Pittsburgh, Pittsburgh, USA

2

University of Minnesota, Minneapolis, USA

3

Carnegie Mellon University, Pittsburgh, USA

4

University of Virginia, Charlottesville, USA

wide, such as the Institute for Quantitative Social Science (IQSS) at Harvard University, Great Britain Historical GIS at the University of Portsmouth, the International Institute of Social History in Amsterdam, and World-Historical Database at the University of Pittsburgh. Interpreting and mining these historical data involve consolidating and fusing large amounts of data from different sources. In health science, for instance, the Vaccine Modeling Initiative at the University of Pittsburgh aims to gather and analyze information from thousands of reports on epidemiological