Multivariate power series interpoint distances

  • PDF / 1,654,265 Bytes
  • 28 Pages / 439.37 x 666.142 pts Page_size
  • 62 Downloads / 271 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789().,-volV)

ORIGINAL PAPER

Multivariate power series interpoint distances Reza Modarres1



Yu Song1

Accepted: 19 January 2020  Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract We establish (a) the probability mass function of the interpoint distance (IPD) between random vectors that are drawn from the multivariate power series family of distributions (MPSD); (b) obtain the distribution of the IPD within one sample and across two samples from this family; (c) determine the distribution of the MPSD Euclidean norm and distance from fixed points in Zd ; and (d) provide the distribution of the IPDs of vectors drawn from a mixture of the MPSD distributions. We present a method for testing the homogeneity of MPSD mixtures using the sample IPDs. Keywords MPSD family  Interpoint distance  Normand mixtures

Mathematics Subject Classification 62H10  62E15  62H15

1 Introduction The goal of this paper is to find the distributions of the squared Euclidean interpoint distance (IPD) for the multivariate power series distributions (MPSD). The prominent members of this family are the multinomial (MN), negative multinomial (NGMN), multivariate Poisson (MP), and the multivariate logarithmic (ML) distributions. IPDs find applications in many scientific fields and are the building blocks of several multivariate techniques, including comparison of distributions, clustering, classification, correspondence analysis and multidimensional scaling, among others. Analysis of point patterns (Ripley 1977), minimal spanning trees for Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10260020-00508-8) contains supplementary material, which is available to authorized users. & Reza Modarres [email protected] 1

Department of Statistics, George Washington University, Washington, DC, USA

123

R. Modarres, Y. Song

detection of spatial disease clusters in health surveillance systems, tests of the homogeneity of distributions, and data depth functions can be investigated by using IPDs and their distributions. While not all tests of homogeneity of distributions depend on IPDs, tests on IPDs are very competitive with specialized tests in high dimensional cases where classical homogeneity tests may fail. For example, Kolesnik (2014) obtains the distribution of the IPD between two independent telegraph processes. Modarres (2013) investigates the distribution of IPDs among the observations drawn from a high dimensional multivariate Bernoulli distribution. Lukens (2004) uses IPD densities from a mixture of two multivariate normal distributions to identify high dimensional data structures and outlier groups. Osada et al. (2002) explores the practical aspects of using the IPDs for discriminating shapes in image analysis and Berrendero et al. (2016) identify a shape with the corresponding IPD distribution. Categorical data arise in numerous circumstances including biology, ecology, physics, gene expression analysis and other scientific fields, either as sampling