Fully asynchronous stochastic coordinate descent: a tight lower bound on the parallelism achieving linear speedup

PDF / 848,660 Bytes
63 Pages / 439.37 x 666.142 pts Page_size
58 Downloads / 245 Views

Series A

Fully asynchronous stochastic coordinate descent: a tight lower bound on the parallelism achieving linear speedup Yun Kuen Cheung1 · Richard Cole2 · Yixin Tao2 Received: 7 May 2019 / Accepted: 4 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2020

Abstract We seek tight bounds on the viable parallelism in asynchronous implementations of coordinate descent that achieves linear speedup. We focus on asynchronous coordinate descent (ACD) algorithms on convex functions which consist of the sum of a smooth convex part and a possibly non-smooth separable convex part. We quantify the shortfall in progress compared to the standard sequential stochastic gradient descent. This leads to a simple yet tight analysis of the standard stochastic ACD in a partially asynchronous environment, generalizing and improving the bounds in prior work. We also give a considerably more involved analysis for general asynchronous environments in which the only constraint is that each update can overlap with at most q others. The new lower bound on the maximum degree of parallelism attaining linear speedup is tight and improves the best prior bound almost quadratically. Keywords Asynchronous coordinate descent · Stochastic coordinate descent · Linear speedup · Amortized analysis Mathematics Subject Classification 90C06 Large-scale problems in mathematical programming · 90C25 Convex programming

Part of the work done while Yun Kuen Cheung held positions at Courant Institute, NYU, at Faculty of Computer Science, University of Vienna and at Max Planck Institute for Informatics, Saarland Informatics Campus. He was supported in part by NSF Grant CCF-1217989, the Vienna Science and Technology Fund (WWTF) project ICT10-002, Singapore NRF 2018 Fellowship NRF-NRFF2018-07 and MOE AcRF Tier 2 Grant 2016-T2-1-170. Additionally the research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement No. 340506. Richard Cole and Yixin Tao’s work was supported in part by NSF Grants CCF-1217989, CCF-1527568 and CCF-1909538. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10107020-01552-8) contains supplementary material, which is available to authorized users.

B

Yixin Tao [email protected]

Extended author information available on the last page of the article

123

Y. K. Cheung et al.

1 Introduction We consider the problem of finding an (approximate) minimum point of a convex function F : Rn → R of the form

F(x) = f (x) +

n

Ψk (xk ),

k=1

where f : Rn → R is a smooth convex function,1 and each Ψk : R → R is a univariate convex function, but may be non-smooth. Such functions occur in many data analysis and machine learning problems, such as linear regression (e.g., the Lasso approach to regularized least squares [28]) where Ψk (xk ) = |xk |, logistic regression [21], ridge regression [26] where Ψk (xk ) is a quadratic function, and

Data Loading...

Fully asynchronous stochastic coordinate descent: a tight lower bound on the parallelism achieving linear speedup

Recommend Documents

A tight lower bound for semi-synchronous collaborative grid exploration

On Relation Between Constraint Propagation and Block-Coordinate Descent in Linear Programs

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

Ryan Williams: A New Lower Bound

On the lower bound for a quadratic problem on the Stiefel manifold

How to Circumvent the Two-Ciphertext Lower Bound for Linear Garbling Schemes

Smooth Minimization of Nonsmooth Functions with Parallel Coordinate Descent Methods

Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization pro

A Lower Bound on the Number of Maximal Subgroups in a Finite Group

Speedup

Fault Recovery Methods for Asynchronous Linear Solvers

A lower bound for the nearest correlation matrix problem based on the circulant mean