Kernel machines for current status data

  • PDF / 2,462,479 Bytes
  • 43 Pages / 439.37 x 666.142 pts Page_size
  • 90 Downloads / 230 Views

DOWNLOAD

REPORT


Kernel machines for current status data Yael Travis‑Lumer1   · Yair Goldberg1 Received: 17 July 2019 / Revised: 22 September 2020 / Accepted: 4 November 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract In survival analysis, estimating the failure time distribution is an important and difficult task, since usually the data is subject to censoring. Specifically, in this paper we consider current status data, a type of data where the failure time cannot be directly observed. The format of the data is such that the failure time is restricted to knowledge of whether or not the failure time exceeds a random monitoring time. We propose a flexible kernel machine approach for estimation of the failure time expectation as a function of the covariates, with current status data. In order to obtain the kernel machine decision function, we minimize a regularized version of the empirical risk with respect to a new loss function. Using finite sample bounds and novel oracle inequalities, we prove that the obtained estimator converges to the true conditional expectation for a large family of probability measures. Finally, we present a simulation study and an analysis of real-world data that compares the performance of the proposed approach to existing methods. We show empirically that our approach is comparable to current state of the art, and in some cases is even better. Keywords  Kernel machines · Oracle inequalities · Support vector regression · Survival analysis · Universal consistency

1 Introduction In this paper we aim to develop a general model free method for analyzing current status data using machine learning techniques. In particular, we propose a kernel machine learning method for estimation of the failure time expectation with current status data. Kernel machines, also known as support vector machines, were originally introduced by Vapnik in the 1990’s and are firmly related to statistical learning theory (Vapnik 1999). Kernel machines are learning algorithms that utilize positive definite kernels (Hofmann et  al. 2008). The choice of kernel machines for current status data is motivated by the fact that kernel machines can be implemented easily, have fast training speed, produce decision functions that have a strong generalization ability, and can guarantee

Editor: Jean-Philippe Vert. * Yael Travis‑Lumer travis‑[email protected] 1



The Faculty of Industrial Engineering and Management, Technion, 3200003 Haifa, Israel

13

Vol.:(0123456789)



Machine Learning

convergence to the optimal solution, under some weak assumptions (Shivaswamy et al. 2007). The format of current status data is such that the failure time T is restricted to knowledge of whether or not T exceeds a random monitoring time C. Current status data is also known in the literature as type I interval censored data (Huang and Wellner 1997). This data format is quite common and includes examples from various fields. Jewell and van  der Laan (2004) mention a few examp