The Isolation Principle of Clustering: Structural Characteristics and Implementation

  • PDF / 245,125 Bytes
  • 15 Pages / 595 x 842 pts (A4) Page_size
  • 19 Downloads / 183 Views

DOWNLOAD

REPORT


RACT The isolation principle rests on defining internal and external differentiation for each subset of at least two objects. Subsets with larger external than internal differentiation form isolated groups in the sense that they are internally cohesive and externally isolated. Objects that do not belong to any isolated group are termed solitary. The collection of all isolated groups and solitary objects forms a hierarchical (encaptic) structure. This ubiquitous characteristic of biological organization provides the motivation to identify universally applicable practical methods for the detection of such structure, to distinguish primary types of structure, to quantify their distinctiveness, and to simplify interpretation of structural aspects. A method implementing the isolation principle (by generating all isolated groups and solitary objects) is proven to be specified by single-linkage clustering. Basically, the absence of structure can be stated if no isolated groups exist, the condition for which is provided. Structures that allow for classifications in the sense of complete partitioning into disjoint isolated groups are characterized, and measures of distinctiveness of classification are developed. Among other primary types of structure, chaining (complete nesting) and ties (isolated groups without internal structure) are considered in more detail. Some biological examples for the interpretation of structure resulting from application of the isolation principle are outlined.

Key Words: isolaton principle, internal differentiation, external differentiation, encapsis, hierarchical structure, cluster mehod, single linkage, classification, measure of clustering structure, degree of cluster isolation

INTRODUCTION Every data analysis is characterized by (1) its specific objective, (2) an operational concept of how the objective is to be achieved, (3) by a method that realizes the concept, and (4) the techniques and tools required for implementation of the method. The concept, in particular, provides the guidelines along which the results obtained with the help of the method and its implementation are to be interpreted. A method thus gains significance solely through the concept that it serves. The less explicit and operational the formulation or design of the concept, the more difficult it becomes to develop a distinctive method. With reference to cluster analysis, the objective is to detect clusters, the concept determines the desirable characteristics of a cluster (clustering criteria), the method of clustering specifies how the clusters showing these characteristics can be found, and the method itself is implemented with the help of a computational algorithm. There probably is general agreement about these four requisites of the analysis of data, particularly in the case of cluster analysis. The distinction between methods and Acta Biotheoretica (2006) 54: 219–233 DOI: 10.1007/s10441-006-8255-3

 C

Springer 2006

220

H.-R. GREGORIUS

algorithms, for example, has always been of concern (see e.g. Jardine and Si