Detecting Ordinal Subcascades

  • PDF / 1,460,113 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 56 Downloads / 200 Views

DOWNLOAD

REPORT


Detecting Ordinal Subcascades Ludwig Lausser1· Lisa M. Schäfer1· Silke D. Kühlwein1· Angelika M. R. Kestler3· Hans A. Kestler1,2 Accepted: 1 October 2020 © The Author(s) 2020

Abstract Ordinal classifier cascades are constrained by a hypothesised order of the semantic class labels of a dataset. This order determines the overall structure of the decision regions in feature space. Assuming the correct order on these class labels will allow a high generalisation performance, while an incorrect one will lead to diminished results. In this way ordinal classifier systems can facilitate explorative data analysis allowing to screen for potential candidate orders of the class labels. Previously, we have shown that screening is possible for total orders of all class labels. However, as datasets might comprise samples of ordinal as well as non-ordinal classes, the assumption of a total ordering might be not appropriate. An analysis of subsets of classes is required to detect such hidden ordinal substructures. In this work, we devise a novel screening procedure for exhaustive evaluations of all order permutations of all subsets of classes by bounding the number of enumerations we have to examine. Experiments with multi-class data from diverse applications revealed ordinal substructures that generate new and support known relations. Keywords Ordinal classification · Classifier cascades · Error bounds · Subsets · Supersets

1 Introduction Extensive data collections are considered valuable resources for hypothesis generation as well as theory building and confirmation. Providing samples of major concepts or categories, they can be seen as the foundation of data-driven learning and reasoning. Classical machine learning techniques focus on the discrimination of individual concepts. Depending on the

Ludwig Lausser and Lisa M. Schäfer have contributed equally to this work. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11063-02010362-0) contains supplementary material, which is available to authorized users.

B

Hans A. Kestler [email protected]; [email protected]

1

Institute of Medical Systems Biology, Albert-Einstein-Allee 11, 89081 Ulm, Germany

2

Leibniz Institute on Aging – Fritz Lipmann Institute, 07745 Jena, Germany

3

Internal Medicine I, University Hospital Ulm, 89069 Ulm, Germany

123

L. Lausser et al.

chosen model type, they allow an interpretation of the underlying discrimination rule and the generation of hypotheses on its intrinsic characteristics [10,27,45]. For example, features with a high impact on a decision boundary can be reported when screening for potential causes of class differences [21,34,36]. Dense regions can be extracted when the definition of prototypic cases is of interest [16]. Combined with external domain knowledge classification models can also give hints to higher-level processes involved [35,47]. Hypotheses on the relations of the categories are only rarely provided [9,22,52]. Nevertheless, they have tremendous explanatory pot