Incremental predictive clustering trees for online semi-supervised multi-target regression

  • PDF / 872,794 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 94 Downloads / 209 Views

DOWNLOAD

REPORT


Incremental predictive clustering trees for online semi‑supervised multi‑target regression Aljaž Osojnik1   · Panče Panov2   · Sašo Džeroski1  Received: 16 July 2019 / Revised: 8 July 2020 / Accepted: 19 September 2020 © The Author(s) 2020

Abstract In many application settings, labeling data examples is a costly endeavor, while unlabeled examples are abundant and cheap to produce. Labeling examples can be particularly problematic in an online setting, where there can be arbitrarily many examples that arrive at high frequencies. It is also problematic when we need to predict complex values (e.g., multiple real values), a task that has started receiving considerable attention, but mostly in the batch setting. In this paper, we propose a method for online semi-supervised multi-target regression. It is based on incremental trees for multi-target regression and the predictive clustering framework. Furthermore, it utilizes unlabeled examples to improve its predictive performance as compared to using just the labeled examples. We compare the proposed iSOUP-PCT method with supervised tree methods, which do not use unlabeled examples, and to an oracle method, which uses unlabeled examples as though they were labeled. Additionally, we compare the proposed method to the available state-of-the-art methods. The method achieves good predictive performance on account of increased consumption of computational resources as compared to its supervised variant. The proposed method also beats the state-of-the-art in the case of very few labeled examples in terms of performance, while achieving comparable performance when the labeled examples are more common. Keywords  Multi-target regression · Data stream mining · Semi-supervised learning · Predictive clustering

Editors: Larisa Soldatova, Joaquin Vanschoren. * Aljaž Osojnik [email protected] Panče Panov [email protected] Sašo Džeroski [email protected] 1

Jožef Stefan Institute, Jamova cesta 39, Ljubljana, Slovenia

2

Jožef Stefan International Postgraduate School, Jožef Stefan Institute, Jamova cesta 39, Ljubljana, Slovenia



13

Vol.:(0123456789)



Machine Learning

1 Introduction Recently, there has been lot of interest in the research community to develop methods for prediction of complex values. One such predictive learning task is the task of multi-target regression (MTR), where we want to predict multiple continuous values, called targets, at the same time. The targets are assumed to be related, but equally important. Methods for MTR can be used directly to produce predictive models or they can be utilized by more complex systems, e.g., in recommender systems. Methods for MTR are fairly common in the regular batch learning setting, but rarer in the online learning setting. In the batch learning setting, the entire dataset is available at the start of the learning process and the order of the examples in the dataset is generally assumed not to have an impact on the learning process. In online learning, the entire dataset is not available at the start of the learning