The conditional censored graphical lasso estimator

  • PDF / 1,105,043 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 58 Downloads / 216 Views

DOWNLOAD

REPORT


The conditional censored graphical lasso estimator Luigi Augugliaro1

· Gianluca Sottile1,2 · Veronica Vinciotti3

Received: 25 October 2019 / Accepted: 28 April 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In many applied fields, such as genomics, different types of data are collected on the same system, and it is not uncommon that some of these datasets are subject to censoring as a result of the measurement technologies used, such as data generated by polymerase chain reactions and flow cytometer. When the overall objective is that of network inference, at possibly different levels of a system, information coming from different sources and/or different steps of the analysis can be integrated into one model with the use of conditional graphical models. In this paper, we develop a doubly penalized inferential procedure for a conditional Gaussian graphical model when data can be subject to censoring. The computational challenges of handling censored data in high dimensionality are met with the development of an efficient expectation-maximization algorithm, based on approximate calculations of the moments of truncated Gaussian distributions and on a suitably derived two-step procedure alternating graphical lasso with a novel block-coordinate multivariate lasso approach. We evaluate the performance of this approach on an extensive simulation study and on gene expression data generated by RT-qPCR technologies, where we are able to integrate network inference, differential expression detection and data normalization into one model. Keywords Censored data · Censored graphical lasso · Conditional Gaussian graphical models · High-dimensional setting · Sparsity

1 Introduction Conditional graphical models, also called conditional random fields, were originally introduced in Lafferty et al. (2001). Formally, let y = (y1 , . . . , y p ) and x = (x1 , . . . , xq ) be p- and q-dimensional random vectors, respectively, and let G = (V, E) be a graph with vertex set V = {1, . . . , p}, indexing only the entries in y, and edge set E ⊆ V ×V, where Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11222-020-09945-7) contains supplementary material, which is available to authorized users.

B

Luigi Augugliaro [email protected] Gianluca Sottile [email protected] Veronica Vinciotti [email protected]

1

Department of Economics, Business and Statistics, University of Palermo, Palermo, Italy

2

Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Palermo, Italy

3

Department of Mathematics, Brunel University London, Uxbridge, UK

(h, k) ∈ E iff there is a directed edge from the vertex h to k in G. An edge is called undirected if both (h, k) and (k, h) are in E, and the graph G is called undirected if it has only undirected edges, which is the case that we will consider in this paper. Let z = (x  , y ) and denote by f ( y | x) the conditional density function of y given x. W