A multi-resolution approximation via linear projection for large spatial datasets

  • PDF / 3,018,549 Bytes
  • 42 Pages / 439.37 x 666.142 pts Page_size
  • 59 Downloads / 183 Views

DOWNLOAD

REPORT


A multi‑resolution approximation via linear projection for large spatial datasets Toshihiro Hirano1  Received: 13 April 2020 / Accepted: 28 September 2020 © Japanese Federation of Statistical Science Associations 2020

Abstract Recent technical advances in collecting spatial data have been increasing the demand for methods to analyze large spatial datasets. The statistical analysis for these types of datasets can provide useful knowledge in various fields. However, conventional spatial statistical methods, such as maximum-likelihood estimation and kriging, are impractically time-consuming for large spatial datasets due to the necessary matrix inversions. To cope with this problem, we propose a multi-resolution approximation via linear projection (M-RA-lp). The M-RA-lp conducts a linear projection approach on each subregion whenever a spatial domain is subdivided, which leads to an approximated covariance function capturing both the large- and small-scale spatial variations. Moreover, we elicit the algorithms for fast computation of the loglikelihood function and predictive distribution with the approximated covariance function obtained by the M-RA-lp. Simulation studies and a real data analysis for air dose rates demonstrate that our proposed M-RA-lp works well relative to the related existing methods. Keywords  Covariance tapering · Gaussian process · Geostatistics · Large spatial datasets · Multi-resolution approximation · Stochastic matrix approximation

1 Introduction Advances in Global Navigation Satellite System (GNSS) and compact sensing devices have made it easy to collect a large volume of spatial data with coordinates in various fields such as environmental science, traffic, and urban engineering. The statistical analysis for these types of spatial datasets would assist in an evidencebased environmental policy and the efficient management of a smart city.

* Toshihiro Hirano 1hirano2@kanto‑gakuin.ac.jp 1



College of Economics, Kanto Gakuin University, 1‑50‑1, Mutsuura Higashi, Kanazawa‑ku, Yokohama, Kanagawa 236‑8501, Japan

13

Vol.:(0123456789)



Japanese Journal of Statistics and Data Science

In spatial statistics, this type of statistical analysis, including model fitting and spatial prediction, has been conducted based on Gaussian processes (see, e.g., Cressie and Wikle 2011). However, traditional spatial statistical methods, such as maximum-likelihood estimation and kriging, are computationally infeasible for large spatial datasets, requiring O(n3 ) operations for a dataset of size n. This is because these methods involve the inversion of an n × n covariance matrix. This difficulty has encouraged the development of many efficient statistical techniques for large spatial datasets. Heaton et al. (2019) comprehensively reviews recent developments of these techniques. Liu et al. (2020) is a detailed survey on current state-of-the-art scalable Gaussian processes in the machine learning literature. Efficient statistical techniques are generally categorized into four types: a sparse approach, a low r