An R Package for Generating Covariance Matrices for Maximum-Entropy Sampling from Precipitation Chemistry Data

PDF / 4,909,719 Bytes
21 Pages / 439.642 x 666.49 pts Page_size
54 Downloads / 264 Views

An R Package for Generating Covariance Matrices for Maximum-Entropy Sampling from Precipitation Chemistry Data Hessa Al-Thani1 · Jon Lee1 Received: 23 December 2019 / Accepted: 20 March 2020 / © Springer Nature Switzerland AG 2020

Abstract We present an open-source R package (MESgenCov v 0.1.0) for temporally fitting multivariate precipitation chemistry data and extracting a covariance matrix for use in the MESP (maximum-entropy sampling problem). We provide multiple functionalities for modeling and model assessment. The package is tightly coupled with NADP/NTN (National Atmospheric Deposition Program/National Trends Network) data from their set of 379 monitoring sites, 1978–present. The user specifies the sites, chemicals, and time period desired, fits an appropriate user-specified univariate model for each site and chemical selected, and the package produces a covariance matrix for use by MESP algorithms. Keywords Maximum-entropy sampling · Covariance matrix · Environmental monitoring · Environmetrics · NADP · NTN

1 Introduction The MESP (maximum-entropy sampling problem) (see [8, 16, 23, 24]) has been applied to many domains where the objective is to determine a “most informative” subset YS , of pre-specified size s = |S| > 0, from a Gaussian random vecor YN , |N| = n > s. Information is typically measured by (differential) entropy. Generally, we assume that YN has a joint Gaussian distribution with mean vector μ and covariance matrix C. Up to constants, the entropy of YS is the log of the determinant of the

Jon Lee

[email protected] Hessa Al-Thani [email protected] 1

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109, USA

17

Page 2 of 21

SN Operations Research Forum

(2020) 1:17

principle submatrix C[S, S]. So, the MESP seeks to maximize the (log) determinant of C[S, S], for some S ⊆ N with |S| = s. The MESP is NP-hard (see [14]), and there has been considerable work on algorithms aimed at exact solutions for problems of moderate size; see [1–5, 7, 12, 14, 15, 17]. All of this algorithmic work is based on a branch-and-bound framework introduced in [14], and the bulk of the contributions in these references is on different methods for upper bounding the optimal value. This work has been developed and validated in the context of a very small number of data sets, despite the fact that of course multivariate data is widely available. The reason for this shortcoming is that despite all of the raw multivariate data that is available, it is not a simple matter to turn this data into meaningful covariance matrices for Gaussian random variables. Our goal with the R package (MESgenCov v 0.1.0) that we have developed is to provide such a link — between readily available raw environmental-monitoring data and covariance matrices suitable for the MESP — in the context of environmental monitoring. Our work fits squarely into recent efforts to better exploit massive amounts of available data for mathematical-programming approaches to decision problems. Even if we have r

Data Loading...

An R Package for Generating Covariance Matrices for Maximum-Entropy Sampling from Precipitation Chemistry Data

Recommend Documents

mbend: an R package for bending non-positive-definite symmetric matrices to positive-definite

dietr : an R package for calculating fractional trophic levels from quantitative and qualitative diet data

An integrated framework for visualizing and forecasting realized covariance matrices

An Efficient Method for Generating Matrices of Quantum Logic Circuits

FingerPro: an R Package for Tracking the Provenance of Sediment

Brq: an R package for Bayesian quantile regression

ideal: an R/Bioconductor package for interactive differential expression analysis

The MOBSTER R package for tumour subclonal deconvolution from bulk DNA whole-genome sequencing data

Generating Value from Government Data Using AI: An Exploratory Study

Generating Referring Expressions from RDF Knowledge Graphs for Data Linking

Object Tracking in Video Using Covariance Matrices

An R package for an integrated evaluation of statistical approaches to cancer incidence projection