Confidence regions and other tools for an extension of correspondence analysis based on cumulative frequencies

  • PDF / 1,676,216 Bytes
  • 25 Pages / 439.37 x 666.142 pts Page_size
  • 60 Downloads / 192 Views

DOWNLOAD

REPORT


Confidence regions and other tools for an extension of correspondence analysis based on cumulative frequencies Antonello D’Ambra1   · Pietro Amenta2   · Eric J. Beh3  Received: 29 February 2020 / Accepted: 12 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Over the past 50 years, correspondence analysis (CA) has increasingly been used by data analysts to examine the association structure of categorical variables that are cross-classified to form a contingency table. However, the literature has paid little attention to the case where the variables are ordinal. Indeed, Pearson’s chisquared statistic X 2 can perform badly in studying the association between ordinal categorical variables (Agresti in An introduction to categorical data analysis, Wiley, Hoboken, 1996; Barlow et al. in Statistical inference under order restrictions, Wiley, New York, 1972). Taguchi’s (Nair in Technometrics 28(4):283–291, 1986; Nair in J Am Stat Assoc 82:283–291, 1987) and Hirotsu’s (Biometrika 73: 165–173, 1986) statistics have been introduced in the literature as simple alternatives to Pearson’s index for contingency tables with ordered categorical variables. Taguchi’s statistic takes into account the presence of an ordinal categorical variable by considering the cumulative sum of the cell frequencies across the variable. An extension of correspondence analysis using a decomposition of Taguchi’s statistic has been introduced to accommodate this feature of the variables. This considers the impact of differences between adjacent ordered categories on the association between row and column categories. Therefore, the main aim of this paper is to introduce a confidence region for each of the ordered categories so that one may determine the statistical significance of a category with respect to the null hypothesis of independence. We highlight that the construction of these circles has not been considered in the literature for this approach to CA. We also introduce a suitable decomposition of Taguchi’s statistic to test the statistical significance of each column category. Keywords  Contingency table · Chi-squared statistic · Single cumulative chi-squared statistic · Confidence circle

* Pietro Amenta [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)



A. D’Ambra et al.

1 Introduction It is well known in the statistical literature that Pearson’s chi-squared test of independence between the variables of a contingency table does not perform well when the rows/columns of the table are ordered (Agresti 1996; Barlow et  al. 1972); this is due in part to the low power for ordered alternatives to the null hypothesis. Barlow et  al. (1972) discuss several exact and approximate likelihood ratio procedures for testing in these situations. Unfortunately, the distribution theory underlying these procedures can be complex. Taguchi’s statistic (Taguchi 1966, 1974) has been introduced as a simple alternative to Pearson’s statistic for ordered contingency tabl