A Statistical Method for Association Analysis of Cell Type Compositions

  • PDF / 914,447 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 66 Downloads / 169 Views

DOWNLOAD

REPORT


A Statistical Method for Association Analysis of Cell Type Compositions Licai Huang1,2 · Paul Little1 · Jeroen R. Huyghe1 · Qian Shi3 · Tabitha A. Harrison1 · Greg Yothers4 · Thomas J. George5 · Ulrike Peters1 · Andrew T. Chan6 · Polly A. Newcomb1 · Wei Sun1  Received: 5 August 2019 / Revised: 14 March 2020 / Accepted: 28 August 2020 © International Chinese Statistical Association 2020

Abstract Gene expression data are often collected from tissue samples that are composed of multiple cell types. Studies of cell type composition based on gene expression data from tissue samples have recently attracted increasing research interest and led to new method development for cell type composition estimation. This new information on cell type composition can be associated with individual characteristics (e.g., genetic variants) or clinical outcomes (e.g., survival time). Such association analysis can be conducted for each cell type separately followed by multiple testing correction. An alternative approach is to evaluate this association using the composition of all the cell types, thus aggregating association signals across cell types. A key challenge of this approach is to account for the dependence across cell types. We propose a new method to quantify the distances between cell types while accounting for their dependencies, and use this information for association analysis. We demonstrate our method in two applied examples: to assess the association between immune cell type composition in tumor samples of colorectal cancer patients versus survival time and SNP genotypes. We found immune cell composition has prognostic value, and our distance metric leads to more accurate survival time prediction than other distance metrics that ignore cell type dependencies. In addition, survival time-associated SNPs are enriched among the SNPs associated with immune cell composition. Keywords  Cell type composition · Genome-wide associations · Survival time

Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s1256​ 1-020-09293​-0) contains supplementary material, which is available to authorized users. * Polly A. Newcomb [email protected] * Wei Sun [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)



Statistics in Biosciences

1 Introduction Variation of cell type composition across tissue samples can explain a substantial proportion of gene expression variation. For example, a recent study showed that more than 88% of gene expression variation in human brains may be explained by the variation of cell type compositions [1]. Many methods have been developed to estimate cell type composition of a tissue sample based on gene expression data from this sample and an external reference of cell type-specific gene expression [2, 3]. While assessing associations of these estimates with individual characteristics (e.g., genetic and environmental factors) or clinical outcomes (e.g., survival time) is of interest, it is a challenging pr