Penalized least squares approximation methods and their applications to stochastic processes

  • PDF / 4,582,359 Bytes
  • 29 Pages / 439.37 x 666.142 pts Page_size
  • 99 Downloads / 206 Views

DOWNLOAD

REPORT


Penalized least squares approximation methods and their applications to stochastic processes Takumi Suzuki1,2   · Nakahiro Yoshida1,2 Received: 28 December 2018 / Accepted: 12 November 2019 © Japanese Federation of Statistical Science Associations 2020

Abstract We construct an objective function that consists of a quadratic approximation term and an Lq penalty (0 < q ≤ 1) term. Thanks to the quadratic approximation, we can deal with various kinds of loss functions into a unified way, and by taking advantage of the Lq penalty term, we can simultaneously execute variable selection and parameter estimation. In this article, we show that our estimator has oracle properties, and even better property. We also treat stochastic processes as applications. Keywords  Variable selection · Least squares approximation · Cox process · Diffusion type process

1 Introduction The least absolute shrinkage and selection operator (LASSO; Tibshirani 1996) is a useful and widely studied approach to the problem of variable selection. Compared with other estimation methods, LASSO’s major advantage is simultaneous execution of both parameter estimation and variable selection (Fan and Li 2001; Tibshirani 1996). Originally, LASSO was introduced for linear regression problems. Suppose that 𝐲 = [y1 , … , yT ]� is a response vector and 𝐱j = [x1j , … , xTj ]� , j = 1, … , d, are the linearly independent predictors.1 Then the LASSO estimator is defined by 1

  The prime denotes the matrix transpose.

This work was in part supported by Japan Science and Technology Agency CREST JPMJCR14D7; Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research No. 17H01702 (Scientific Research); and by a Cooperative Research Program of the Institute of Statistical Mathematics. * Takumi Suzuki [email protected]‑tokyo.ac.jp Nakahiro Yoshida [email protected]‑tokyo.ac.jp 1

Graduate School of Mathematical Sciences, University of Tokyo, 3‑8‑1 Komaba, Meguro‑ku, Tokyo 153‑8914, Japan

2

CREST, Japan Science and Technology Agency, Kawaguchi, Japan



13

Vol.:(0123456789)



Japanese Journal of Statistics and Data Science

{ 𝜃̂LASSO = argmin 𝜃∈ℝd

}

d d ‖2 ‖ ∑ ∑ ‖ ‖ |𝜃j | 𝐱 j 𝜃j ‖ + 𝜆 ‖𝐲 − ‖ ‖ j=1 j=1 ‖ ‖

,

(1.1)

where 𝜆 is a nonnegative regularization parameter. The second term in (1.1) is the so-called L1 penalty. Thanks to the singularity of the L1 penalty at the origin, LASSO can perform automatic variable selection. However, it is known that LASSO variable selection could be inconsistent (see, e.g. Zou 2006), because LASSO forces the coefficients to be equally penalized in L1 penalty. Zou (2006) considers the different weights to different coefficients, and the estimator so obtained is called the adaptive LASSO estimator. More precisely, the adaptive LASSO estimator 𝜃̂aLASSO is defined by { } d d ‖2 ‖ ∑ ∑ ‖ ‖ 𝜃̂aLASSO = argmin ‖𝐲 − ŵ j |𝜃j | , 𝐱j 𝜃j ‖ + 𝜆T ‖ ‖ 𝜃∈ℝd j=1 j=1 ‖ ‖ where ŵ = [ŵ j ]j is a weight vector defined by ŵ j = 1∕|𝜃̂j |𝛾 for some constant 𝛾 > 0 and an initial estimator 𝜃̂ = [𝜃̂j ]j . The adaptive LASSO method requires co