Goodness-of-Fit Measures for Two-mode Cluster Analyses

Two-mode cluster analyses take pleasure in increasing distribution not only in psychological but also in management applications, for example in the controlling of advertising effects (Schwaiger 1997a ). Until today, only a few index numbers exist that ar

  • PDF / 1,217,283 Bytes
  • 8 Pages / 439 x 666 pts Page_size
  • 79 Downloads / 215 Views

DOWNLOAD

REPORT


1

Introduction

Classification methods represent a crucial part of exploratory data analysis. Their value for practical business tools appears especially in the context of interpreting results. A typology does not arise to life nor does a two-mode dendrogram become helpful for decisions before resulting clusters are interpreted. A researcher's scepticism about exploratory results is generally considered important, because the number crunching of data may cause misleading conclusions in the interpretation, one of the reasons being that not every detailed information can be taken into consideration. Questions such as how a present classification can be evaluated basically arise in the analyses of classification results. The answer to this question is mainly dependent on the criterion of the goodness used. Therefore we want to discuss the few available goodness-of-fit-measures for the two-mode cluster analysis in Chapter 2 and furthermore present an adaption of the criterion of variance (e.g. Opitz 1980, p. 76) for twomode clusters and its application in Chapter 3. In the two-mode cluster analysis row- and column-elements of a data matrix are classified simultaneously. As an input for the two-mode cluster analysis, a data matrix X = (x;j)(nxm) must be used. For that purpose, any two-mode matrix serves as a data matrix X if created by relating

W. Gaul et al. (eds.), Classification, Automation, and New Media © Springer-Verlag Berlin, Heidelberg 2002

402 numerical values Xij to the elements Oi E 0, Mj E M of the cartesian product 0 x M (0 = {Ol, ... ,On}, M = {Ml,oo.,Mm }). Possibilities of application in economic and social sciences are described in DeSarbo (1982), De Soete (1984), Both/Gaul (1986), Eckes (1993 and 1995) and Schwaiger (1998). To illustrate our explanations we use an association matrix obtained by forming the average of the ratings of 53 students, who were asked to evaluate eight (more or less) famous persons by relating to given attributes on a six-step, bipolar scale. Two-mode cluster analysis now is meant to unite prominent persons and the attributes primarily associated with them in two-mode clusters. Different algorithms are available for that purpose (cf. Schwaiger 1997a, p. 96ff.). We will focus on hierarchical methods creating a dendrogram as a result and, at the same time, showing the process of fusion in contrast to non-hierarchical methods. The methods used are the Centroid-EffectMethod CEM (Eckes/Orlik 1991, 1993), the Missing-Value-Algorithm (Espejo/Gaul 1986, p. 123) in Average-Linkage-Variant (MVAL), and the ESOCLUS-Algorithm (Schwaiger 1997a, p. 113ff.) in the Completeand Average-Linkage form (ESOCLUS CL resp. ESOCLUS AL). The classification results differ, so we have to decide on which results to rely on considering a goodness-of-fit index.

2

Goodness-of-Fit Indicators for Two-Mode Classification Results

A qualitative evaluation of two-mode cluster analysis is basically possible with representative methods like factor analysis, MDS and correspondence analysis if their goodness of