Imbalanced regression and extreme value prediction

PDF / 1,919,533 Bytes
33 Pages / 439.37 x 666.142 pts Page_size
60 Downloads / 351 Views

Imbalanced regression and extreme value prediction Rita P. Ribeiro1,2 · Nuno Moniz1,2 Received: 15 January 2020 / Revised: 31 July 2020 / Accepted: 11 August 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Second, standard evaluation metrics focus on assessing the performance of models on the most common values of data distributions. In this paper, we present an approach to tackle imbalanced regression tasks where the objective is to predict extreme (rare) values. We propose an approach to formalise such tasks and to optimise/evaluate predictive models, overcoming the factors mentioned and issues in related work. We present an automatic and non-parametric method to obtain relevance functions, building on the concept of relevance as the mapping of target values into non-uniform domain preferences. Then, we propose SERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias. An experimental study demonstrates how SERA provides valid and useful insights into the performance of models in imbalanced regression tasks. Keywords Supervised learning · Imbalanced domain learning · Imbalanced regression · Extreme value prediction

1 Introduction The primary assumption of standard supervised learning tasks is that each value of the domain is equally important. However, this is not always true. In domains such as finance, meteorology or environmental sciences, the goal is often the prediction of uncommon events, also known as rare/extreme cases. Imbalanced domain learning tasks Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Nuno Moniz [email protected] Rita P. Ribeiro [email protected] 1

INESC TEC, Porto, Portugal

2

Department of Computer Science, Faculty of Sciences, University of Porto, Porto, Portugal

13

Vol.:(0123456789)

●

● ●●

●● ●

●●

●

1.0

Importance

Fig. 1 Illustration of the importance of values for a target variable distribution in a regression task: the assumption of uniform domain preferences (dashed green), as in standard regression tasks, and our objective—nonuniform domain preferences biased to extreme values (black) (Color figure online)

Machine Learning

0.5

0.0 0

20

y

40

60

formalise such predictive modelling scenarios. These have two characteristics (Branco et al. 2016): (i) skewed distribution of target variables and (ii) domain preference for underrepresented cases. Research concerning imbalanced domain learning spans over 20 years, addressing various aspects (Fernández et al. 2018; Branco et al. 2016; López et al. 2013; Kr

Data Loading...

Imbalanced regression and extreme value prediction

Recommend Documents

Extreme Value Statistics

Extreme Value Distributions

Extreme Value Theory

Extreme Value Theory

Prediction of Extreme Events

Multivariate Extreme Value Theory and D-Norms

Modeling Extreme Ground-Motion Intensities Using Extreme Value Theory

The Shapley value of regression portfolios

Two-Stage Game Strategy for Multiclass Imbalanced Data Online Prediction

Robust prediction and extrapolation designs for nonlinear regression with imprecision

Adaptive Decision Threshold-Based Extreme Learning Machine for Classifying Imbalanced Multi-label Data

Study of Maximum Temperature Using Extreme Value Distributions