On the Selection of the Regularization Parameter in Stacking

PDF / 277,469 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
55 Downloads / 288 Views

On the Selection of the Regularization Parameter in Stacking Tadayoshi Fushiki1 Accepted: 15 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Stacking is a model combination technique to improve prediction accuracy. Regularization is usually necessary in stacking because some predictions used in the model combination provide similar predictions. Cross-validation is generally used to select the regularization parameter, but it incurs a high computational cost. This paper proposes two simple low computational cost methods for selecting the regularization parameter. The effectiveness of the methods is examined in numerical experiments. Asymptotic results in a particular setting are also shown. Keywords Cross-validation · Model combination · Ridge regression · Stacking

1 Introduction Stacking [16] is a model combination technique to improve prediction accuracy. Breiman [3], LeBlanc and Tibshirani [9] and Clarke [6] studied the fundamental properties of stacking. LeBlanc and Tibshirani [9] pointed out that [15] had already studied stacking under the name of model-mix. Stacking has since been studied in many applications (for example [11,13,14,17,18]). Regularization plays an important role in model combination because some predictions that could be used in the combination provide similar predictions. Ridge regularization [7] is sometimes used in stacking [14]. Cross-validation is a standard technique for choosing the regularization parameter, but its computational cost is high. Consequently, many studies have avoided optimizing the regularization parameter [9]. In this paper, we examine some approaches to approximate the cross-validation procedure for choosing the regularization parameter in stacking. This paper is organized as follows. Section 2 defines stacking. In Sect. 3, two methods are proposed to approximate the cross-validation procedure for choosing the regularization parameter in stacking. In Sect. 4, the proposed methods are examined in some examples. Section 5 provides asymptotic results for the proposed methods. Section 6 summarizes the results.

B 1

Tadayoshi Fushiki [email protected] Niigata University, Niigata, Japan

123

T. Fushiki

2 Model Selection, Model Averaging and Model Combination Observations D = {(x1 , y1 ), . . . , (x N , y N )} are independent and identically generated from a distribution p(x, y). We consider the problem of constructing a point prediction for y at x from D. Models M1 , . . . , M M are assumed, and predictions are given by f 1 (x; D), . . . , f M (x; D). Model selection [8] is mostly used to obtain a final prediction in statistics. After estimating the prediction error E(x,y)∼ p(x,y) ((y − f m (x; D))2 ) for each m, mˆ is determined by minimizing the prediction error estimate. The final point prediction is given by f mˆ (x; D). There are several methods in which all predictions are used to construct a final prediction. Model averaging [5] is one such method. In Bayesian model averaging, the prediction is obtained by M

Data Loading...

On the Selection of the Regularization Parameter in Stacking

Recommend Documents

Spatially dependent regularization parameter selection for total generalized variation-based image denoising

Parameter Selection Methods in Inverse Problem Formulation

On the convergence of algorithms with Tikhonov regularization terms

On the regularization of vector integer quadratic programming problems

$$l_1$$ l 1 -Regularization for multi-period portfolio selection

A Regularization-Based Feature Scoring Criterion on Candidate Genetic Marker Selection of Sporadic Motor Neuron Disease

The influence of motor tasks and cut-off parameter selection on artifact subspace reconstruction in EEG recordings

On the two-parameter Lorentzian homothetic motions

On the Effect of Carbon on the Stacking Fault Energy of Austenitic Stainless Steels

The Effect of Annealing Temperature on the Morphology of Stacking Faults in Czochralski Silicon

Regularization

Adaptive Regularization of the Reference Model in an Inverse Problem