On the Selection of the Regularization Parameter in Stacking

  • PDF / 277,469 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 55 Downloads / 183 Views

DOWNLOAD

REPORT


On the Selection of the Regularization Parameter in Stacking Tadayoshi Fushiki1 Accepted: 15 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Stacking is a model combination technique to improve prediction accuracy. Regularization is usually necessary in stacking because some predictions used in the model combination provide similar predictions. Cross-validation is generally used to select the regularization parameter, but it incurs a high computational cost. This paper proposes two simple low computational cost methods for selecting the regularization parameter. The effectiveness of the methods is examined in numerical experiments. Asymptotic results in a particular setting are also shown. Keywords Cross-validation · Model combination · Ridge regression · Stacking

1 Introduction Stacking [16] is a model combination technique to improve prediction accuracy. Breiman [3], LeBlanc and Tibshirani [9] and Clarke [6] studied the fundamental properties of stacking. LeBlanc and Tibshirani [9] pointed out that [15] had already studied stacking under the name of model-mix. Stacking has since been studied in many applications (for example [11,13,14,17,18]). Regularization plays an important role in model combination because some predictions that could be used in the combination provide similar predictions. Ridge regularization [7] is sometimes used in stacking [14]. Cross-validation is a standard technique for choosing the regularization parameter, but its computational cost is high. Consequently, many studies have avoided optimizing the regularization parameter [9]. In this paper, we examine some approaches to approximate the cross-validation procedure for choosing the regularization parameter in stacking. This paper is organized as follows. Section 2 defines stacking. In Sect. 3, two methods are proposed to approximate the cross-validation procedure for choosing the regularization parameter in stacking. In Sect. 4, the proposed methods are examined in some examples. Section 5 provides asymptotic results for the proposed methods. Section 6 summarizes the results.

B 1

Tadayoshi Fushiki [email protected] Niigata University, Niigata, Japan

123

T. Fushiki

2 Model Selection, Model Averaging and Model Combination Observations D = {(x1 , y1 ), . . . , (x N , y N )} are independent and identically generated from a distribution p(x, y). We consider the problem of constructing a point prediction for y at x from D. Models M1 , . . . , M M are assumed, and predictions are given by f 1 (x; D), . . . , f M (x; D). Model selection [8] is mostly used to obtain a final prediction in statistics. After estimating the prediction error E(x,y)∼ p(x,y) ((y − f m (x; D))2 ) for each m, mˆ is determined by minimizing the prediction error estimate. The final point prediction is given by f mˆ (x; D). There are several methods in which all predictions are used to construct a final prediction. Model averaging [5] is one such method. In Bayesian model averaging, the prediction is obtained by M