Mixed logistic regression in genome-wide association studies

PDF / 2,317,282 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
106 Downloads / 389 Views

METHODOLOGY ARTICLE

Open Access

Mixed logistic regression in genome‑wide association studies Jacqueline Milet1, David Courtin1, André Garcia1 and Hervé Perdry2*

*Correspondence: [email protected] 2 Université Paris-Saclay, UVSQ, Inserm, CESP, 94807 Villejuif, France Full list of author information is available at the end of the article

Abstract Background: Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results: We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion: The two proposed methods are implemented in the R package milorG‑ WAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis). Keywords: GWAS, Mixed-models, Logistic regression

Background Population stratification has long been known to be at the origin of spurious associations in genetic association studies [1]: if the frequency of the phenotype of interest varies across the population strata, it will be associated to any allele the frequency of which varies accordingly. An early and elegant solution to this issue has been the use of family data, notably in the Transmission Disequilibrium Test (TDT) [2] and in the Family Based Association Test (FBAT) [3]. However, these methods imposed the ascertainment and genotyping of affected individuals’ relatives, impairing their practical feasibility. The advent of Genome-Wide Association Studies (GWAS), demanding increasingly large samples to detect weaker and weaker effects, made the problem even more accurate. © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party mater

Data Loading...

Mixed logistic regression in genome-wide association studies

Recommend Documents

Binary Logistic Regression

Modelling of land-use change in Thailand using binary logistic regression and multinomial logistic regression

Use of Genomewide Association Studies to Evaluate Genetic Predisposition to Testicular Germ Cell Tumors

Improved ridge regression estimators for the logistic regression model

Logistic Regression A Self-Learning Text

Generalized logistic distribution and its regression model

Logistic Regression A Self-Learning Text

Logistic Regression A Self-Learning Text

Predicting Type 2 Diabetes Using Logistic Regression

Diffusion logistic regression algorithms over multiagent networks

Regression Methods in Biostatistics Linear, Logistic, Survival, and

Regression Methods in Biostatistics Linear, Logistic, Survival, and