Classical and Robust Regression Analysis with Compositional Data

  • PDF / 1,862,608 Bytes
  • 36 Pages / 439.37 x 666.142 pts Page_size
  • 33 Downloads / 196 Views

DOWNLOAD

REPORT


Classical and Robust Regression Analysis with Compositional Data K. G. van den Boogaart1 · P. Filzmoser2 · K. Hron3 · M. Templ4 · R. Tolosana-Delgado1

Received: 15 November 2019 / Accepted: 14 September 2020 © The Author(s) 2020

Abstract Compositional data carry their relevant information in the relationships (logratios) between the compositional parts. It is shown how this source of information can be used in regression modeling, where the composition could either form the response, or the explanatory part, or even both. An essential step to set up a regression model is the way how the composition(s) enter the model. Here, balance coordinates will be constructed that support an interpretation of the regression coefficients and allow for testing hypotheses of subcompositional independence. Both classical leastsquares regression and robust MM regression are treated, and they are compared within different regression models at a real data set from a geochemical mapping project.

B

K. G. van den Boogaart [email protected] P. Filzmoser [email protected] K. Hron [email protected] M. Templ [email protected] R. Tolosana-Delgado [email protected]

1

Helmholtz Institut Freiberg for Resources Technology, Freiberg, Germany

2

Institute of Statistics and Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria

3

Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University, Olomouc, Czech Republic

4

Institute for Data Analysis and Process Design, Zurich University of Applied Sciences, Winterthur, Switzerland

123

Math Geosci

Keywords Balances · Robust regression · GEMAS project · Hypothesis testing · Robust bootstrap

1 Introduction Although regression analysis belongs to the most developed statistical procedures, there is not too much literature available if compositional data are involved (see, e.g., Aitchison 1986; Daunis-i-Estadella et al. 2002; Tolosana-Delgado and van den Boogaart 2011; van den Boogaart and Tolosana-Delgado 2013; Egozcue et al. 2013; Pawlowsky-Glahn et al. 2015; Fišerová et al. 2016; Coenders et al. 2017; Filzmoser et al. 2018; Greenacre 2019). Regression with compositional data presents certain particularities because of the special statistical scale of compositions. Compositions have been typically (and restrictively) defined as vectors of positive components summing up to a constant, with the simplex as their sampling space (Aitchison 1986). However, since the beginning of the twenty-first century, it has become clearer that data may not abide to the constant sum condition and nevertheless be sensibly considered as compositional, the determining point being the relativity of the information provided (Aitchison 1997; Barceló-Vidal et al. 2001). That means, when dealing with compositional data the crucial quantities are formed by the ratios between the compositional parts (i.e., the variables) rather than by the reported data values directly. There are different proposals to extract this relative