Imbalanced regression and extreme value prediction

  • PDF / 1,919,533 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 60 Downloads / 205 Views

DOWNLOAD

REPORT


Imbalanced regression and extreme value prediction Rita P. Ribeiro1,2 · Nuno Moniz1,2  Received: 15 January 2020 / Revised: 31 July 2020 / Accepted: 11 August 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Second, standard evaluation metrics focus on assessing the performance of models on the most common values of data distributions. In this paper, we present an approach to tackle imbalanced regression tasks where the objective is to predict extreme (rare) values. We propose an approach to formalise such tasks and to optimise/evaluate predictive models, overcoming the factors mentioned and issues in related work. We present an automatic and non-parametric method to obtain relevance functions, building on the concept of relevance as the mapping of target values into non-uniform domain preferences. Then, we propose SERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias. An experimental study demonstrates how SERA provides valid and useful insights into the performance of models in imbalanced regression tasks. Keywords  Supervised learning · Imbalanced domain learning · Imbalanced regression · Extreme value prediction

1 Introduction The primary assumption of standard supervised learning tasks is that each value of the domain is equally important. However, this is not always true. In domains such as finance, meteorology or environmental sciences, the goal is often the prediction of uncommon events, also known as rare/extreme cases. Imbalanced domain learning tasks Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Nuno Moniz [email protected] Rita P. Ribeiro [email protected] 1

INESC TEC, Porto, Portugal

2

Department of Computer Science, Faculty of Sciences, University of Porto, Porto, Portugal



13

Vol.:(0123456789)



● ●●

●● ●

●●



1.0

Importance

Fig. 1  Illustration of the importance of values for a target variable distribution in a regression task: the assumption of uniform domain preferences (dashed green), as in standard regression tasks, and our objective—nonuniform domain preferences biased to extreme values (black) (Color figure online)

Machine Learning

0.5

0.0 0

20

y

40

60

formalise such predictive modelling scenarios. These have two characteristics (Branco et  al. 2016): (i) skewed distribution of target variables and (ii) domain preference for underrepresented cases. Research concerning imbalanced domain learning spans over 20 years, addressing various aspects (Fernández et al. 2018; Branco et al. 2016; López et al. 2013; Kr