Error Propagation in Isometric Log-ratio Coordinates for Compositional Data: Theoretical and Practical Considerations

  • PDF / 1,157,995 Bytes
  • 21 Pages / 439.37 x 666.142 pts Page_size
  • 44 Downloads / 209 Views

DOWNLOAD

REPORT


Error Propagation in Isometric Log-ratio Coordinates for Compositional Data: Theoretical and Practical Considerations Mehmet Can Mert1 · Peter Filzmoser1 · Karel Hron2

Received: 27 August 2015 / Accepted: 15 June 2016 © The Author(s) 2016. This article is published with open access at Springerlink.com

Abstract Compositional data, as they typically appear in geochemistry in terms of concentrations of chemical elements in soil samples, need to be expressed in log-ratio coordinates before applying the traditional statistical tools if the relative structure of the data is of primary interest. There are different possibilities for this purpose, like centered log-ratio coefficients, or isometric log-ratio coordinates. In both the approaches, geometric means of the compositional parts are involved, and it is unclear how measurement errors or detection limit problems affect their presentation in coordinates. This problem is investigated theoretically by making use of the theory of error propagation. Due to certain limitations of this approach, the effect of error propagation is also studied by means of simulations. This allows to provide recommendations for practitioners on the amount of error and on the expected distortion of the results, depending on the purpose of the analysis. Keywords Aitchison geometry · Orthonormal coordinates · Taylor approximation · Compositional differential calculus · Detection limit

B

Mehmet Can Mert [email protected]; [email protected] Karel Hron [email protected]

1

Institute of Statistics and Mathematical Methods in Economics, Vienna University of Technology, Wiedner Hauptstrasse 8-10, 1040 Vienna, Austria

2

Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University, 17. listopadu 12, 771 46 Olomouc, Czech Republic

123

Math Geosci

1 Introduction Compositional data analysis is concerned with analyzing the relative information between the variables, the so-called compositional parts, of a multivariate data set. Here, relative information refers to the log-ratio methodology (Aitchison 1986) and, therefore, in fact, to an analysis of logarithms of ratios between the compositional parts. It has been demonstrated that the sample space of compositions is not the usual Euclidean space, but the simplex with the so-called Aitchison geometry (PawlowskyGlahn et al. 2015). For a composition x = (x1 , . . . , x D ) with D parts, the simplex sample space is defined as S D = {x = (x1 , . . . , x D )

D 

such that x j > 0 ∀ j,

x j = κ}

j=1

for an arbitrary constant κ. Nevertheless, according to recent developments, the sample space of compositional data is even more general (Pawlowsky-Glahn et al. 2015): A vector x is a D-part composition when all its components are strictly positive real numbers and carry only relative information. Note that the term relative information is equivalent to information lies in the ratios between the components, not in the absolute values. As a consequence, the actual sample space of compositional data