Multivariate functional data modeling with time-varying clustering

  • PDF / 644,432 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 21 Downloads / 241 Views

DOWNLOAD

REPORT


Multivariate functional data modeling with time-varying clustering Philip A. White1

· Alan E. Gelfand2

Received: 9 March 2020 / Accepted: 29 August 2020 © Sociedad de Estadística e Investigación Operativa 2020

Abstract We consider the setting of multivariate functional data collected over time at each of a set of sites. Our objective is to implement model-based clustering of the functions across the sites where we allow such clustering to vary over time. Anticipating dependence between the functions within a site as well as across sites, we model the collection of functions using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a computationally manageable stochastic process specification. To jointly cluster the functions, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise over continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ partitioning of the timescale to capture time-varying clustering. Our illustrative setting is bivariate, monitoring ozone and PM10 levels over time for one year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City for 2017 which record hourly ozone and PM10 levels. Hence, we have 48 functions to work with across 8760 hours. We provide a Gaussian process model for each function using continuous-time meteorological variables as regressors along with adjustment for daily periodicity. We interpret the similarity of functions in terms of their shape, captured through site-specific coefficients, and use these coefficients to develop the clustering. Keywords Dimension reduction · Dirichlet process · Hierarchical model · Latent factor models · Multivariate Gaussian process · Ozone · PM10 Mathematics Subject Classification 60J25 · 60G15 · 62F15 · 62H25 · 62H30 · 62P12

B

Philip A. White [email protected]

1

Department of Statistics, Brigham Young University, Provo, UT, USA

2

Department of Statistical Science, Duke University, Durham, NC, USA

123

P. A. White, A. E. Gelfand

1 Introduction Functional data analysis has become a widely used tool for learning about the behavior of a complex process living over a continuous domain. Its formal emergence in Statistics dates to Ramsay (1982) and Ramsay and Dalzell (1991) with Ramsay and Silverman (2007) providing an introduction to the concepts of functional data analysis. Applications have been found in MRI brain images (and, in fact, for imaging in general), finance, climatic variation, spectrometry data, and time-course gene expression data, to name a few. For a comprehensive overview of applications, see Ullah and Finch (2013). The domain is viewed as a subset of Rd , where d = 1, 2, 3 in most applications, and the response is usually viewed as smooth over the domain, i.e., at least continuous.