Feature ranking for multi-target regression

  • PDF / 927,427 Bytes
  • 26 Pages / 439.37 x 666.142 pts Page_size
  • 53 Downloads / 218 Views

DOWNLOAD

REPORT


Feature ranking for multi-target regression Matej Petkovi´c1,2

· Dragi Kocev1,2 · Sašo Džeroski1,2

Received: 1 July 2018 / Revised: 12 June 2019 / Accepted: 6 July 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Abstract In this work, we address the task of feature ranking for multi-target regression (MTR). The task of MTR concerns problems with multiple continuous dependent/target variables, where the goal is to learn a model for predicting all of them simultaneously. This task is receiving an increasing attention from the research community, but performing feature ranking in the context of MTR has not been studied thus far. Here, we study two groups of feature ranking scores for MTR: scores (Symbolic, Genie3 and Random Forest score) based on ensembles (bagging, random forests, extra trees) of predictive clustering trees, and a score derived as an extension of the RReliefF method. We also propose a generic data-transformation approach to MTR feature ranking and thus have two versions of each score. For both groups of feature ranking scores, we analyze their theoretical computational complexity. For the extension of the RReliefF method, we additionally derive some theoretical properties of the scores. Next, we extensively evaluate the scores on 24 benchmark MTR datasets, in terms of the quality of the ranking and the computational complexity of producing it. The results identify the parameters that influence the quality of the rankings, reveal that both groups of methods produce relevant feature rankings, and show that the Symbolic and Genie3 score, coupled with random forest ensembles, yield the best rankings. Keywords Feature ranking · Multi target regression · Tree based methods · Relief

1 Introduction Single target regression (STR) is the predictive modeling task of learning a model able to predict the values of a single numeric target variable. STR can be generalized to multi-target

Editors: Takuya Kida, Takeaki Uno, Tetsuji Kuboyama, Akihiro Yamamoto.

B

Matej Petkovi´c [email protected] Dragi Kocev [email protected] Sašo Džeroski [email protected]

1

Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia

2

Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia

123

Machine Learning

regression (MTR), where the goal is to learn a model that predicts several (at least two) target variables simultaneously. While STR is a well established research topic, MTR is only recently attracting interest in the research community (Kocev et al. 2013; SpyromitrosXioufis et al. 2016; Borchani et al. 2015). MTR is a structured output prediction task with applications in a wide range of real life problems. Prominent examples for MTR come from ecology and include predicting the abundance of different species sharing the same habitat (Džeroski et al. 2000), predicting forest properties (Kocev et al. 2009), chemometrics to infer concentrations of several analytes from multivariate calibration using multivariate