Active feature acquisition on data streams under feature drift
- PDF / 735,109 Bytes
- 15 Pages / 595.224 x 790.955 pts Page_size
- 46 Downloads / 186 Views
Active feature acquisition on data streams under feature drift Christian Beyer1
1 · Vishnu Unnikrishnan1 · Miro Schleicher1 · Eirini Ntoutsi2 · Myra Spiliopoulou1 ¨ · Maik Buttner
Received: 17 December 2019 / Accepted: 1 June 2020 © The Author(s) 2020
Abstract Traditional active learning tries to identify instances for which the acquisition of the label increases model performance under budget constraints. Less research has been devoted to the task of actively acquiring feature values, whereupon both the instance and the feature must be selected intelligently and even less to a scenario where the instances arrive in a stream with feature drift. We propose an active feature acquisition strategy for data streams with feature drift, as well as an active feature acquisition evaluation framework. We also implement a baseline that chooses features randomly and compare the random approach against eight different methods in a scenario where we can acquire at most one feature at the time per instance and where all features are considered to cost the same. Our initial experiments on 9 different data sets, with 7 different degrees of missing features and 8 different budgets show that our developed methods outperform the random acquisition on 7 data sets and have a comparable performance on the remaining two. Keywords Active feature acquisition · Data streams · Feature drift
1 Introduction Active learning (AL) usually concerns itself with a scenario where we deal with label scarcity and have the option to acquire labels for a cost with the help of an oracle. The
The first-author position is shared between the first two authors Christian Beyer and Maik B¨uttner. Christian Beyer
[email protected] Maik B¨uttner [email protected] Vishnu Unnikrishnan [email protected] Miro Schleicher [email protected] Eirini Ntoutsi [email protected] Myra Spiliopoulou [email protected] 1
Otto-von-Guericke University, Magdeburg, Germany
2
Leibniz University, Hannover, Germany
goal is to intelligently pick instances whose labels, when acquired, improve the performance of our predictive model once we trained it on the chosen instances. A more recent development is to consider the scenario where we cannot acquire labels for a cost but missing features. This is called active feature acquisition (AFA). We propose new AFA methods for data streams. Settles describes the goal of AFA in [1] as the following: “The goal in active feature acquisition is to select the most informative features to obtain during training, rather than randomly or exhaustively acquiring all new features for all training instances.” For example if we want to predict whether a patient has a certain complex disease or not, we could choose from multiple medical tests and have to find a trade-off between which tests are the most predictive and which tests are the cheapest. The results of a test represent the value to a feature that was initially missing and we would like to have a strategy that tells us if we still need to acquire features f
Data Loading...