Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling

  • PDF / 1,116,052 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 111 Downloads / 224 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Open Access

Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling Adrián Mosquera Orgueira1,2,3,4* , José Ángel Díaz Arias1,2,4, Miguel Cid López1,2, Andrés Peleteiro Raíndo1,2, Beatriz Antelo Rodríguez1,2,4, Carlos Aliste Santos1,5, Natalia Alonso Vence1,2, Ángeles Bendaña López1,2, Aitor Abuín Blanco1,2, Laura Bao Pérez1,2, Marta Sonia González Pérez1,2, Manuel Mateo Pérez Encinas1,2,4, Máximo Francisco Fraga Rodríguez1,4,5 and José Luis Bello López1,2,4

Abstract Background: Thirty to forty percent of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. In this study we investigated new machine learning-based models of survival using transcriptomic and clinical data. Methods: Gene expression profiling (GEP) of in 2 different publicly available retrospective DLBCL cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel’s concordance index (c-index) was used to assess model’s predictability. Results were validated in an independent test set. Results: Two hundred thirty-three and sixty-four patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. This pattern included the following genes: TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and the expression levels of 50 individual genes (training set c-index, 0.8404, test set c-index, 0.7942). (Continued on next page)

* Correspondence: [email protected] 1 Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain 2 Department of Hematology, SERGAS, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), Santiago, Spain Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give app