Consistency bounds and support recovery of d-stationary solutions of sparse sample average approximations
- PDF / 728,357 Bytes
- 26 Pages / 439.37 x 666.142 pts Page_size
- 111 Downloads / 198 Views
Consistency bounds and support recovery of d-stationary solutions of sparse sample average approximations Miju Ahn1 Received: 30 December 2018 / Accepted: 1 November 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract This paper studies properties of the d(irectional)-stationary solutions of sparse sample average approximation problems involving difference-of-convex sparsity functions under a deterministic setting. Such properties are investigated with respect to a vector which satisfies a verifiable assumption to relate the empirical sample average approximation problem to the expectation minimization problem defined by an underlying data distribution. We derive bounds for the distance between the two vectors and the difference of the model outcomes generated by them. Furthermore, the inclusion relationships between their supports, sets of nonzero valued indices, are studied. We provide conditions under which the support of a d-stationary solution is contained within, and contains, the support of the vector of interest; the first kind of inclusion can be shown for any given arbitrary set of indices. Some of the results presented herein are generalization of the existing theory for a specialized problem of 1 -norm regularized least squares minimization for linear regression. Keywords Non-convex optimization · Sparse learning · Difference-of-convex program · Directional stationary solution
1 Introduction Statistical models are often trained through the method of sample average approximations (SAA) given loss and model functions such that the trained model achieves minimal error on the available data points or samples. Such methodology serves as a practical treatment to fit the ‘best’ model while exploiting the restrictive information of limited data. Ideally, the ultimate goal of a statistical learning problem is to find a model which makes robust predictions for unobserved data points, i.e., the model yields minimal prediction error with respect to an underlying data-generating distribution. Obtaining the latter model involves
This work is derived and extended from the last chapter [1, Chapter 4] of the author’s Ph. D. dissertation which was written under the supervision of Jong-Shi Pang.
B 1
Miju Ahn [email protected] Department of Engineering Management, Information, and Systems, Southern Methodist University, Dallas, TX 75205, USA
123
Journal of Global Optimization
solving an optimization formulation in which the expectation of the composite of the loss and the model function is minimized with respect to the data population. The connection between the SAA and the expectation minimization problems have been extensively studied in the past. A convergence result is given by the classical Law of Large Numbers which states that, under some regularity conditions, a SAA involving a given objective function converges to the expectation of the function as the sample size grows to infinity. In [25], convergence of the set of the optimal solutions and the objective values of the approxi
Data Loading...