A survey of current software for haplotype phase inference

  • PDF / 144,220 Bytes
  • 4 Pages / 609 x 791 pts Page_size
  • 90 Downloads / 201 Views

DOWNLOAD

REPORT


A survey of current software for haplotype phase inference Michael E. Weale Bloomsbury Analytical Services, 28/30 Little Russell Street, London WC1A 2HN, UK; Tel: þ44 020 7404 3040; Fax: þ44 020 7404 2083; Email: [email protected] Date received (in revised form): 9th November 2003

Abstract In the past two years, tracking the explosion in data due to ever-improving single nucleotide polymorphism (SNP) maps and cheaper highthroughput genotyping technologies, a bewildering array of new algorithms and relevant software have appeared for haplotype phase inference. The alternatives to haplotype inference are to resolve haplotypes completely, either by in vitro methods or by typing close pedigrees, which is expensive and is not guaranteed in pedigrees, or to ignore haplotype-level analysis in favour of genotype-level analysis, which avoids the danger of treating inferred haplotypes as real but denies the researcher, potentially, any valuable analytic insights. This review attempts a snapshot of this rapidly moving field as it stands at present, and is mainly restricted, given the current predominance of SNP genotyping, to the consideration of diallelic data. For completeness, the review will occasionally refer to algorithms for which no software exists. Keywords: haplotype phase inference, algorithms, software, parsimony, maximum likelihood, Bayesian analysis

Introduction Haplotype phase algorithms can be conveniently split into three main types: parsimony, maximum likelihood and Bayesian. The researcher may either want to infer haplotype frequencies in the population, impute the haplotypes possessed by given individuals, or both. In general, parsimony methods most naturally estimate individual haplotypes, maximum likelihood methods most naturally estimate population frequencies and Bayesian methods can do both. Parsimony algorithms avoid explicit likelihood calculations by minimising a ‘costly’ constraint. The grandfather of all haplotype phase algorithms (an elderly 13 year old) is Clark’s method,1 a simple iterative procedure inspired by the constraint ‘minimise the number of new haplotypes you have to invent’. (To obtain ‘HAPINFERX’ software, apply to [email protected].) The method can either suffer from having too many solutions or from having none (although the general problem of convergence is a common issue with all haplotype inference algorithms). There is also no guarantee that the global minimum for the ‘minimise haplotype number’ constraint is reached by Clark’s algorithm. This latter problem is fixed in a more recent algorithm2 (‘HAPAR’; apply to [email protected]). Phylogenetic parsimony methods have been explored by Daniel Gusfield and colleagues (‘GPPH’, ‘DPPH’ and ‘BPPH’; http://wwwcsif.cs.ucdavis. edu/, gusfield/). The constraint here is ‘minimise the number of ancestral recombination events required to link the new

invented haplotypes’. As one might expect, this constraint works well in small, tightly-linked genomic regions and less well in bigger regions.3 Because parsimony algorithms