On the Problem of Error Propagation in Classifier Chains for Multi-label Classification

So-called classifier chains have recently been proposed as an appealing method for tackling the multi-label classification task. In this paper, we analyze the influence of a potential pitfall of the learning process, namely the discrepancy between the fea

  • PDF / 375,816 Bytes
  • 8 Pages / 439.36 x 666.15 pts Page_size
  • 78 Downloads / 182 Views

DOWNLOAD

REPORT


Abstract So-called classifier chains have recently been proposed as an appealing method for tackling the multi-label classification task. In this paper, we analyze the influence of a potential pitfall of the learning process, namely the discrepancy between the feature spaces used in training and testing: while true class labels are used as supplementary attributes for training the binary models along the chain, the same models need to rely on estimations of these labels when making a prediction. We provide first experimental results suggesting that the attribute noise thus created can affect the overall prediction performance of a classifier chain.

1 Introduction Multi-label classification (MLC) has attracted increasing attention in the machine learning community during the past few years (Tsoumakas and Katakis 2007).The goal in MLC is to induce a model that assigns a subset of labels to each example, rather than a single one as in multi-class classification. For instance, in a news website, a multi-label classifier can automatically attach several labels— usually called tags in this context—to every article; the tags can be helpful for searching related news or for briefly informing users about their content. Current research on MLC is largely driven by the idea that optimal prediction performance can only be achieved by modeling and exploiting statistical dependencies between labels. Roughly speaking, if the relevance of one label may depend R. Senge ()  E. Hüllermeier Philipps-Universität Marburg, Marburg, Germany e-mail: [email protected]; [email protected] J.J. del Coz University of Oviedo, Gijón, Spain e-mail: [email protected] M. Spiliopoulou et al. (eds.), Data Analysis, Machine Learning and Knowledge Discovery, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-319-01595-8__18, © Springer International Publishing Switzerland 2014

163

164

R. Senge et al.

on the relevance of others, then labels should be predicted simultaneously and not separately. This is the main argument against simple decomposition techniques such as binary relevance (BR) learning, which splits the original multi-label task into several independent binary classification problems, one for each label. Until now, several methods for capturing label dependence have been proposed in the literature, including a method called classifier chains (CC) (Read et al. 2011). This method enjoys great popularity, even though it has been introduced only lately. As its name suggests, CC selects an order on the label set—a chain of labels—and trains a binary classifier for each label in this order. The difference with respect to BR is that the feature space used to induce each classifier is extended by the previous labels in the chain. These labels are treated as additional attributes, with the goal to model conditional dependence between a label and its predecessors. CC performs particularly well when being used in an ensemble framework, usually denoted as ensemble of classifier chai