Active Learning Algorithms for Multi-label Data

Active learning is an iterative supervised learning task where learning algorithms can actively query an oracle, i.e. a human annotator that understands the nature of the pro blem, for labels. As the learner is allowed to interactively choose the data fro

  • PDF / 569,261 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 43 Downloads / 194 Views

DOWNLOAD

REPORT


Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, SP, Brazil {echerman,mcmonard}@icmc.usp.br 2 Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece [email protected]

Abstract. Active learning is an iterative supervised learning task where learning algorithms can actively query an oracle, i.e. a human annotator that understands the nature of the pro blem, for labels. As the learner is allowed to interactively choose the data from which it learns, it is expected that the learner will perform better with less training. The active learning approach is appropriate to machine learning applications where training labels are costly to obtain but unlabeled data is abundant. Although active learning has been widely considered for single-label learning, this is not the case for multi-label learning, where objects can have more than one class labels and a multi-label learner is trained to assign multiple labels simultaneously to an object. We discuss the key issues that need to be considered in pool-based multi-label active learning and discuss how existing solutions in the literature deal with each of these issues. We further empirically study the performance of the existing solutions, after implementing them in a common framework, on two multi-label datasets with different characteristics and under two different applications settings (transductive, inductive). We find out interesting results that we attribute to the properties of, mainly, the data sets, and, secondarily, the application settings. Keywords: Supervised learning ing · Pool-based strategies

1

·

Multi-label learning

·

Active learn-

Introduction

Different approaches to enhance supervised learning have been proposed over the years. As supervised learning algorithms build classifiers based on labeled training examples, several of these approaches aim to reduce the amount of time and effort needed to obtain labeled data for training. Active learning is one of these approaches [6]. The key idea of active learning is to minimize labeling costs by allowing the learner to query for the labels of the most informative unlabeled data instances. These queries are posed to an oracle, e.g. a human c IFIP International Federation for Information Processing 2016  Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 267–279, 2016. DOI: 10.1007/978-3-319-44944-9 23

268

E.A. Cherman et al.

annotator, which understands the nature of the problem. This way, an active learner can substantially reduce the number of labeled data required to construct the classifier. Active learning has been developed substantially to support single-label learning, where each object (instance) in the dataset is associated with only one class label. However, this is not the case in multi-label learning, where each object is associated with a subset of labels. Due to the large number of realworld problems which fall into this cate