Per-sample prediction intervals for extreme learning machines

  • PDF / 2,625,505 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 79 Downloads / 194 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Per‑sample prediction intervals for extreme learning machines Anton Akusok1   · Yoan Miche2 · Kaj‑Mikael Björk3 · Amaury Lendasse4 Received: 30 September 2016 / Accepted: 26 December 2017 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract Prediction intervals in supervised machine learning bound the region where the true outputs of new samples may fall. They are necessary in the task of separating reliable predictions of a trained model from near random guesses, minimizing the rate of false positives, and other problem-specific tasks in applied machine learning. Many real problems have heteroscedastic stochastic outputs, which explains the need of input-dependent prediction intervals. This paper proposes to estimate the input-dependent prediction intervals by a separate extreme learning machine model, using variance of its predictions as a correction term accounting for the model uncertainty. The variance is estimated from the model’s linear output layer with a weighted Jackknife method. The methodology is very fast, robust to heteroscedastic outputs, and handles both extremely large datasets and insufficient amount of training data. Keywords  ELM · Heteroscedastic · Prediction interval · Confidence interval · variance estimation · False positives · Coverage

1 Introduction Practical applications of machine learning can be problematic in the sense that developers and practitioneers often do not fully trust in their own predictions. A fundamental reason for this mistrust can be found in the fact that mean squared error (MSE) and other error measures averaged over a dataset are commonly used to evaluate performance of a method or compare different methods. Averaged error measures are unfit for business processes where each particular sample is important, as it represents a customer or other existing entity [1]. On the other hand, applied machine learning models might skip some data samples, because they are only a part of a bigger process structure, and uncertain data might be given to human experts to be handled [2].

* Anton Akusok [email protected] 1



Arcada University of Applied Sciences, Helsinki, Finland

2



Nokia Solutions and Networks Group, Espoo, Finland

3

Risklab at Arcada University of Applied Sciences, Helsinki, Finland

4

Department of Mechanical and Industrial Engineering and the Iowa Informatics Initiative, The University of Iowa, Iowa City, USA



The trust problem can be solved by computing a samplespecific confidence value [3]. Then predictions with high confidence (and enough trust in them) are used, while data samples with uncertain predictions are passed to the next analytical stage. The machine learning model works as a filter, solving easy cases automatically with confident predictions, and reducing the amount of data remaining to be analyzed [4]. Let {𝐱i , yi }, i ∈ [1, N] be a dataset where outputs y are independently drawn from a normal distribution conditioned on inputs 𝐱: (1) This dataset has heteroscedastic noise because the variance i