Active Learning for Regression Based on Query by Committee

We investigate a committee-based approach for active learning of real-valued functions. This is a variance-only strategy for selection of informative training data. As such it is shown to suffer when the model class is misspecified since the learner’s bia

PDF / 434,278 Bytes
10 Pages / 430 x 660 pts Page_size
15 Downloads / 360 Views

DOWNLOAD

REPORT

Abstract. We investigate a committee-based approach for active learning of real-valued functions. This is a variance-only strategy for selection of informative training data. As such it is shown to suﬀer when the model class is misspeciﬁed since the learner’s bias is high. Conversely, the strategy outperforms passive selection when the model class is very expressive since active minimization of the variance avoids overﬁtting.

1

Introduction

In process control we might wish to identify the eﬀect of factors such as temperature, pH, etc. on output but obtaining such information, for example by running the system at various temperatures, pHs, etc., may be costly. In query learning, our goal is to provide criteria that a learning algorithm can employ to improve its performance by actively selecting data that are most informative. Given a small initial sample such a criterion might indicate that the system be run at particular temperatures, pHs, etc. in order for the relationship between these controls and the output to be better characterized. We focus on supervised learning. Many machine learning algorithms are passive in that they receive a set of labelled data and then estimate the relationship from these data. We investigate a committee-based approach for actively selecting instantiations of the input variables x that should be labelled and incorporated into the training set. We restrict ourselves to the case where the training set is augmented one data point at a time, and assume that an experiment to gain the label y for an instance x is costly but computation is cheap. We investigate under what circumstances committee-based active learning requires fewer queries than passive learning. Query by committee (QBC) was proposed by Seung, Opper and Sompolinksy [1] for active learning of classiﬁcation problems. A committee of learners is trained on the available labelled data by the Gibbs algorithm. This selects a hypothesis at random from those consistent with the currently labelled data. The next query is chosen as that on which the committee members have maximal disagreement. They considered two toy models with perfectly realizable

Thanks to Hugh Mallinson for initial inspiration. This work is supported by EPSRC grant reference S47649.

H. Yin et al. (Eds.): IDEAL 2007, LNCS 4881, pp. 209–218, 2007. c Springer-Verlag Berlin Heidelberg 2007

210

R. Burbidge, J.J. Rowland, and R.D. King

targets. The algorithm was implemented in the query ﬁltering paradigm; the learner is given access to a stream of inputs drawn at random from the input distribution. With a two-member committee, any input on which the committee members make opposite predictions causes maximal disagreement and its label is queried. It was shown under these conditions that generalization error decreases exponentially with the number of labelled examples, but for random queries (i.e. passive learning), generalization error only decreased with an inverse power law. Freund et al. [2] showed that QBC is an eﬃcient query algorithm for the perceptron

Data Loading...

Active Learning for Regression Based on Query by Committee

Recommend Documents

Two Stream Active Query Suggestion for Active Learning in Connectomics

Committee-based Learning

Committee-Based Learning

Attention-Based Query Expansion Learning

Informativeness-Based Active Learning for Entity Resolution

On optimization based extreme learning machine in primal for regression and classification by functional iterative metho

Supervised Learning for Regression Analysis

Statistical Query Learning

Classification Betters Regression in Query-Based Multi-document Summarisation Techniques for Question Answering

Viewing Television by Committee

Particle Swarm Optimization Combined with Query-Based Learning Using MapReduce

Deep Neural Networks for Supervised Learning: Regression