A non-parametric approach to population structure inference using multilocus genotypes

PDF / 1,374,384 Bytes
12 Pages / 606.387 x 787.805 pts Page_size
20 Downloads / 295 Views

A non-parametric approach to population structure inference using multilocus genotypes Nianjun Liu1and Hongyu Zhao2,3* 1

Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USA 3 Department of Genetics, Yale University School of Medicine, New Haven, CT, USA * Correspondence to: Tel: þ1 203 785 6271; Fax: þ1 203 785 6912; E-mail: [email protected] 2

Date received (in revised form): 8th March 2006

Abstract

Inference of population structure from genetic markers is helpful in diverse situations, such as association and evolutionary studies. In this paper, we describe a two-stage strategy in inferring population structure using multilocus genotype data. In the ﬁrst stage, we use dimension reduction methods such as singular value decomposition to reduce the dimension of the data, and in the second stage, we use clustering methods on the reduced data to identify population structure. The strategy has the ability to identify population structure and assign each individual to its corresponding subpopulation. The strategy does not depend on any population genetics assumptions (such as Hardy– Weinberg equilibrium and linkage equilibrium between loci within populations) and can be used with any genotype data. When applied to real and simulated data, the strategy is found to have similar or better performance compared with STRUCTURE, the most popular method in current use. Therefore, the proposed strategy provides a useful alternative to analyse population data. Keywords: population structure, subpopulation, singular value decomposition, dimension reduction, clustering

Introduction Information about the population structure of species is useful in a variety of situations, such as admixture mapping, subspecies classiﬁcation, genetic barrier detection and evolutionary study.1 – 5 For example, anthropologists may have the debris of ancient people, supplied by archaeologists, and want to learn about the relationship between the ancient people and modern populations to infer the evolutionary history of human beings. Population structure can be identiﬁed based on visible characteristics such as language, culture, physical appearance and geographical region. But this can be subjective and may bear no relevance to genetics.3 In other situations, the presence of population structure may constitute a practical nuisance. In association studies, case-control design is often used to identify genetic variants underlying complex traits by comparing allele frequencies between unrelated individuals who are affected and those who are unaffected. The presence of population structure can lead to spurious associations between a candidate marker and a phenotype, however, as a result of population structure in the sample.6,7 In forensic studies, the identiﬁcation of reference groups is very important, but this can be difﬁcult when population structure

exists.4,8 In all of these situations, the ﬁrst step is to ide

Data Loading...

A non-parametric approach to population structure inference using multilocus genotypes

Recommend Documents

Nonparametric Bayesian Inference in Biostatistics

Inference for a Population Proportion

A developmental approach to historical causal inference

Associated Sequences, Demimartingales and Nonparametric Inference

A Bayesian Nonparametric Approach to Differentially Private Data

A Bayesian Nonparametric Framework to Inference on Totals of Finite Populations

A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics

The Epidome - a species-specific approach to assess the population structure and heterogeneity of Staphylococcus epiderm

Decision Support Using Nonparametric Statistics

3125 steps to perfect health: a nonparametric approach to developing the EQ-5D-5L value set

Introduction to Nonparametric Estimation

Modelling Flocculated Cell Suspensions using a Population Balance Approach: Applications to Microfiltration