Active Learning Algorithms for Multi-label Data

Active learning is an iterative supervised learning task where learning algorithms can actively query an oracle, i.e. a human annotator that understands the nature of the pro blem, for labels. As the learner is allowed to interactively choose the data fro

PDF / 569,261 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
43 Downloads / 203 Views

DOWNLOAD

REPORT

Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, SP, Brazil {echerman,mcmonard}@icmc.usp.br 2 Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece [email protected]

Abstract. Active learning is an iterative supervised learning task where learning algorithms can actively query an oracle, i.e. a human annotator that understands the nature of the pro blem, for labels. As the learner is allowed to interactively choose the data from which it learns, it is expected that the learner will perform better with less training. The active learning approach is appropriate to machine learning applications where training labels are costly to obtain but unlabeled data is abundant. Although active learning has been widely considered for single-label learning, this is not the case for multi-label learning, where objects can have more than one class labels and a multi-label learner is trained to assign multiple labels simultaneously to an object. We discuss the key issues that need to be considered in pool-based multi-label active learning and discuss how existing solutions in the literature deal with each of these issues. We further empirically study the performance of the existing solutions, after implementing them in a common framework, on two multi-label datasets with diﬀerent characteristics and under two diﬀerent applications settings (transductive, inductive). We ﬁnd out interesting results that we attribute to the properties of, mainly, the data sets, and, secondarily, the application settings. Keywords: Supervised learning ing · Pool-based strategies

1

·

Multi-label learning

·

Active learn-

Introduction

Diﬀerent approaches to enhance supervised learning have been proposed over the years. As supervised learning algorithms build classiﬁers based on labeled training examples, several of these approaches aim to reduce the amount of time and eﬀort needed to obtain labeled data for training. Active learning is one of these approaches [6]. The key idea of active learning is to minimize labeling costs by allowing the learner to query for the labels of the most informative unlabeled data instances. These queries are posed to an oracle, e.g. a human c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 267–279, 2016. DOI: 10.1007/978-3-319-44944-9 23

268

E.A. Cherman et al.

annotator, which understands the nature of the problem. This way, an active learner can substantially reduce the number of labeled data required to construct the classiﬁer. Active learning has been developed substantially to support single-label learning, where each object (instance) in the dataset is associated with only one class label. However, this is not the case in multi-label learning, where each object is associated with a subset of labels. Due to the large number of realworld problems which fall into this cate

Data Loading...

Active Learning Algorithms for Multi-label Data

Recommend Documents

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

Active Learning and Examination Methods in a Data Structures and Algorithms Course

Machine Learning Models and Algorithms for Big Data Classification T

Active Fire Detection Algorithms

Multilabel Classification

Learning Algorithms for Emergency Management

Advanced Algorithms for Supervised Learning

Algorithms for Spatial Data Integration

Constrained nonnegative matrix factorization-based semi-supervised multilabel learning

Audit Fraud Data Prediction Using Machine Learning Algorithms

Two Stream Active Query Suggestion for Active Learning in Connectomics

Challenges in benchmarking stream learning algorithms with real-world data