Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

  • PDF / 4,559,451 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 67 Downloads / 161 Views

DOWNLOAD

REPORT


Journal of Translational Medicine Open Access

RESEARCH

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm Kerry E. Poppenberg1,4†, Vincent M. Tutino1,2,4,10†, Lu Li3, Muhammad Waqas4,9, Armond June10, Lee Chaves12, Kaiyu Jiang5, James N. Jarvis5,6, Yijun Sun5,7, Kenneth V. Snyder1,4,8,9, Elad I. Levy1,4,8, Adnan H. Siddiqui1,4,8, John Kolega1,10 and Hui Meng1,2,4,11* 

Abstract  Background:  Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods:  Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results:  Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions:  We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates. *Correspondence: [email protected] † Kerry E. Poppenberg and Vincent M. Tutino contributed equally to this work 1 Canon Stroke and Vascular Research Center, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214