Sparse group fused lasso for model segmentation: a hybrid approach

  • PDF / 1,786,512 Bytes
  • 47 Pages / 439.37 x 666.142 pts Page_size
  • 26 Downloads / 228 Views

DOWNLOAD

REPORT


Sparse group fused lasso for model segmentation: a hybrid approach David Degras1 Received: 9 December 2019 / Revised: 27 September 2020 / Accepted: 8 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract This article introduces the sparse group fused lasso (SGFL) as a statistical framework for segmenting sparse regression models with multivariate time series. To compute solutions of the SGFL, a nonsmooth and nonseparable convex program, we develop a hybrid optimization method that is fast, requires no tuning parameter selection, and is guaranteed to converge to a global minimizer. In numerical experiments, the hybrid method compares favorably to state-of-the-art techniques with respect to computation time and numerical accuracy; benefits are particularly substantial in high dimension. The method’s statistical performance is satisfactory in recovering nonzero regression coefficients and excellent in change point detection. An application to air quality data is presented. The hybrid method is implemented in the R package sparseGFL available on the author’s Github page. Keywords Multivariate time series · Model segmentation · High-dimensional regression · Convex optimization · Hybrid algorithm Mathematics Subject Classification 37M10 Time series analysis · 62J05 Linear regression · 62J07 Shrinkage estimators (Lasso) · 65K10 Numerical optimization

1 Introduction In the analysis of complex signals, using a single statistical model with a fixed set of parameters is rarely enough to track data variations over their entire range. In long and/or high-dimensional time series for example, the presence of nonstationarity, either

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11634020-00424-5) contains supplementary material, which is available to authorized users.

B 1

David Degras [email protected] Department of Mathematics, University of Massachusetts Boston, 100 William T. Morrissey Blvd., Boston, MA 02125, USA

123

D. Degras

in the form of slowly drifting dynamics or of abrupt regime changes, requires that statistical models flexibly account for temporal variations in signal characteristics. To overcome the intrinsic limitations of approaches based on a single model vis-à-vis heterogeneous and nonstationary signals, model segmentation techniques have been successfully employed in various fields including image processing (Alaíz et al. 2013; Friedman et al. 2007) genetics (Bleakley and Vert 2011; Tibshirani and Wang 2007), brain imaging (Beer et al. 2019; Cao et al. 2018; Ombao et al. 2005; Xu and Lindquist 2015; Zhou et al. 2013), finance (Hallac et al. 2019; Nystrup et al. 2017), industrial monitoring (Saxén et al. 2016), oceanography (Ranalli et al. 2018), seismology (Ohlsson et al. 2010), and ecology (Alewijnse et al. 2018). Model segmentation consists in partitioning the domain of the signal (e.g. the temporal range of a time series or the lattice of a digital image) into a small number of segments or regions such that for each segment