Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

  • PDF / 1,596,381 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 14 Downloads / 169 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data Souvik Seal1   · Jeffrey A. Boatman1 · Matt McGue2 · Saonli Basu1 Received: 10 November 2019 / Accepted: 27 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study. Keywords  Family data · Population-based association analysis · Genome-wide scan · Generalized estimating equation · Generalized linear mixed effect model · Generalized least squares

Introduction Genome-wide association studies (GWASs) seek to detect associations between genetic variants and observed disease phenotypes. Many such GWASs involve analyzing family data (Duerr et al. 2006; Graham et al. 2008; Benyamin et al. 2009), but explicit modeling of the dependencies among the family members introduces additional complexities. The observations within a family are correlated due to both shared environment and shared genes, which complicates the statistical modeling. Methods for analyzing quantitative traits with family data have been well developed; methods for conducting a genome-wide association study with binary Edited by Stacey Cherny. * Souvik Seal [email protected] 1



Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA



Department of Psychology, University of Minnesota, Minneapolis, MN, USA

2

traits have received less attention. One class of methods for analyzing binary family data, generalized linear mixed models (GLMMs), introduces random effects into the statistical models to account for the within-family dependencies. Other methods, such as generalized estimating equations (GEE) (Liang and Zeger 1986) estimate the marginal, populationaveraged effect of a genetic variant on the phenotype. Another clas