An empirical analysis of binary transformation strategies and base algorithms for multi-label learning

  • PDF / 1,604,453 Bytes
  • 55 Pages / 439.37 x 666.142 pts Page_size
  • 92 Downloads / 170 Views

DOWNLOAD

REPORT


An empirical analysis of binary transformation strategies and base algorithms for multi‑label learning Adriano Rivolli1   · Jesse Read2 · Carlos Soares3 · Bernhard Pfahringer4 · André C. P. L. F. de Carvalho5 Received: 24 April 2018 / Revised: 9 January 2020 / Accepted: 7 April 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Investigating strategies that are able to efficiently deal with multi-label classification tasks is a current research topic in machine learning. Many methods have been proposed, making the selection of the most suitable strategy a challenging issue. From this premise, this paper presents an extensive empirical analysis of the binary transformation strategies and base algorithms for multi-label learning. This subset of strategies uses the one-versus-all approach to transform the original data, generating one binary data set per label, upon which any binary base algorithm can be applied. Considering that the influence of the base algorithm on the predictive performance obtained by the strategies has not been considered in depth by many empirical studies, we investigated the influence of distinct base algorithms on the performance of several strategies. Thus, this study covers a family of multi-label strategies using a diversified range of base algorithms, exploring their relationship over different perspectives. This finding has significant implications concerning the methodology of evaluation adopted in multi-label experiments containing binary transformation strategies, given that multiple base algorithms should be considered. Despite these improvements in strategy and base algorithms, for many data sets, a large number of labels, mainly those less frequent, were either never predicted, or always misclassified. We conclude the experimental analysis by recommending strategies and base algorithms in accordance with different performance criteria. Keywords  Multi-label learning · Binary transformation · Comparison of strategies · Base algorithms · Empirical analysis

Editor: Eyke Hüllermeier. * Adriano Rivolli [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)



Machine Learning

1 Introduction Multi-label learning has been investigated widely by the machine learning community in recent years (de Carvalho and Freitas 2009; Tsoumakas et  al. 2010; Gibaja and Ventura 2014). It deals with classification tasks where an instance can be simultaneously classified into more than one class. Each class is represented by one label. Several domains, such as text (Klimt and Yang 2004; Pestian et al. 2007), multimedia (Duygulu et al. 2002; Zhou and Zhang 2006; Briggs et al. 2013) and biology (Elisseeff and Weston 2001), are intrinsically multi-label. A common approach to dealing with multi-label classification tasks is to transform the original data set into one or more single-label data sets. A conventional binary classification algorithm, called base algorithm here, is