k-Gaps: a novel technique for clustering incomplete climatological time series

  • PDF / 3,155,531 Bytes
  • 14 Pages / 595.224 x 790.955 pts Page_size
  • 66 Downloads / 178 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

k-Gaps: a novel technique for clustering incomplete climatological time series Leopoldo Carro-Calvo1

· Fernando Jaume-Santero2,3

· Ricardo Garc´ıa-Herrera2,3

· Sancho Salcedo-Sanz4

Received: 20 December 2019 / Accepted: 16 September 2020 © The Author(s) 2020

Abstract In this paper, we show a new clustering technique (k-gaps) aiming to generate a robust regionalization using sparse climate datasets with incomplete information in space and time. Hence, this method provides a new approach to cluster time series of different temporal lengths, using most of the information contained in heterogeneous sets of climate records that, otherwise, would be eliminated during data homogenization procedures. The robustness of the method has been validated with different synthetic datasets, demonstrating that k-gaps performs well with sample-starved datasets and missing climate information for at least 55% of the study period. We show that the algorithm is able to generate a climatically consistent regionalization based on temperature observations similar to those obtained with complete time series, outperforming other clustering methodologies developed to work with fragmentary information. k-Gaps clusters can therefore provide a useful framework for the study of long-term climate trends and the detection of past extreme events at regional scales. Keywords Clustering techniques · Climatological time series · Climate trends · Regional analysis

1 Introduction Marked variations in regional climate patterns arise as a response to persistent changes of the climate system. Identifying these patterns is therefore fundamental for a better understanding of past climate changes at local and regional scales. Thus, with increasing computational power, the number of classification methodologies providing robust characterizations of regional climates has quickly escalated in the climate community, becoming a common tool for the study of past climatic patterns (Abatzoglou et al. 2009; Srivastava et al. 2012; Perdinan 2015; Horton et al. 2015).

 Fernando Jaume-Santero

[email protected] 1

Department of Signal Processing and Communications, Universidad Rey Juan Carlos, Fuenlabrada, Madrid, Spain

2

Department of Earth Physics and Astrophysics, Universidad Complutense de Madrid, Madrid, Spain

3

Geosciences Institute (IGEO), (CSIC/UCM), Madrid, Spain

4

Department of Signal Processing and Communications, Universidad de Alcal´a, Madrid, Spain

Within this framework, classical clustering techniques, such as the k-means algorithm (Hartigan and Wong 1979; Phillips 2002), have become widespread in the past few years as dimensionality reduction methods are able to extract relevant information from extensive climate databases (Bernard et al. 2013; Bador et al. 2015; Zhang et al. 2016). These methodologies can arrange data according to their internal structure by defining spatial regions for datasets with geolocated climate information (Rao and Srinivas 2006). Therefore, clustering algorithms have been used in several studies such as