Composite likelihood methods for histogram-valued random variables
- PDF / 804,705 Bytes
- 19 Pages / 595.276 x 790.866 pts Page_size
- 92 Downloads / 178 Views
Composite likelihood methods for histogram-valued random variables T. Whitaker1
· B. Beranger1
· S. A. Sisson1
Received: 5 November 2019 / Accepted: 28 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions—such as random rectangles or histograms—each describing a portion of the larger dataset. Recent work has developed likelihood-based methods that permit fitting models for the underlying data while only observing the distributional summaries. However, while powerful, when working with random histograms this approach rapidly becomes computationally intractable as the dimension of the underlying data increases. We introduce a composite-likelihood variation of this likelihood-based approach for the analysis of random histograms in K dimensions, through the construction of lowerdimensional marginal histograms. The performance of this approach is examined through simulated and real data analysis of max-stable models for spatial extremes using millions of observed datapoints in more than K = 100 dimensions. Large computational savings are available compared to existing model fitting approaches. Keywords Climate models · Composite likelihoods · Random histograms · Spatial extremes · Symbolic data analysis
1 Introduction Continuing advances in measurement technology and information storage are leading to the creation of increasingly large and complex datasets. This inevitably brings new inferential challenges. Symbolic data analysis (SDA), a relatively new field in statistics, has been developed as one way of addressing these issues (e.g. Diday 1989; Bock and Diday 2000). In essence, SDA argues that many important questions can be answered without needing to observe data at the micro-level, and that higher-level, group-based information may be sufficient. As a result, SDA methodology aggregates the micro-data into a much smaller number of distributional summaries, such as random rectangles, random histograms and categorical multi-valued variables, each summarising a portion of the larger dataset (Dias and Brito 2015; Le Rademacher and Billard 2013; Billard and Diday 2006).
B
T. Whitaker [email protected] B. Beranger [email protected] S. A. Sisson [email protected]
1
UNSW Data Science Hub and School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia
These new data “points” (i.e. distributions) are then analysed directly, without any further reference to the micro-data. See e.g. Billard (2011), Bertrand and Goupil (2000) and Billard and Diday (2003) for an exposition of these ideas. SDA methods have found wide application in current statistical practise, and have been developed for a range of inferential procedures, including regression models (Dias and Brito 2015), principle component analysis (Kosmelj and Billard 2014), time series analysis (Wang et al. 2016), clustering (Bri
Data Loading...