BlackBox: Generalizable reconstruction of extremal values from incomplete spatio-temporal data
- PDF / 964,879 Bytes
- 18 Pages / 439.642 x 666.49 pts Page_size
- 69 Downloads / 175 Views
BlackBox: Generalizable reconstruction of extremal values from incomplete spatio-temporal data Tomislav Ivek1 · Domagoj Vlah2 Received: 1 May 2020 / Revised: 7 October 2020 / Accepted: 8 October 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract We describe our submission to the Extreme Value Analysis 2019 Data Challenge in which teams were asked to predict extremes of sea surface temperature anomaly within spatiotemporal regions of missing data. We present a computational framework which reconstructs missing data using convolutional deep neural networks. Conditioned on incomplete data, we employ autoencoder-like models as multivariate conditional distributions from which possible reconstructions of the complete dataset are sampled using imputed noise. In order to mitigate bias introduced by any one particular model, a prediction ensemble is constructed to create the final distribution of extremal values. Our method does not rely on expert knowledge in order to accurately reproduce dynamic features of a complex oceanographic system with minimal assumptions. The obtained results promise reusability and generalization to other domains. Keywords Convolutional neural network · Data reconstruction · Deep learning · Extreme Value Analysis Conference challenge · Ensemble · Spatio-temporal extremes Mathematics Subject Classification (2010) 62P12 · 68T07
1 Introduction The EVA 2019 Data Challenge posited a problem to predict extremes of the Red Sea surface temperature anomaly within spatio-temporal regions of missing data (Huser Domagoj Vlah
[email protected] Tomislav Ivek [email protected] 1
Institut za fiziku, Bijeniˇcka 46, HR-10000 Zagreb, Croatia
2
Department of Applied Mathematics, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia
T. Ivek, D. Vlah
2020). Daily temperature anomaly values were provided for contestants spanning over 31 years and covering the geographical area of the Red Sea. For each day, temperature anomaly values were given at fixed spatial points on a regular geographical grid. About 31.6% of data was deliberately removed from the dataset. Regions of the missing data were approximately contiguous with irregular boundaries, relatively large, at least one calendar month in duration, and present for every calendar day in the provided dataset. The exact process of data removal was not disclosed to contestants. The goal was to predict the distribution of extremes of temperature anomaly on a number of specified space-time cylindrical regions (50km in radius and 7 days in length), chosen in the most difficult part of the dataset which had 60% percent of data missing for any day. The quality of predicted extremes was evaluated using the threshold-weighted continuous ranked probability score averaged over all prediction regions twCRPS (Huser 2020). Recently there has been an increase in adoption of deep neural network models in various areas of research and technology which feature high-dimensional interdependent da
Data Loading...