An introduction to effective use of enrichment analysis software

  • PDF / 90,251 Bytes
  • 5 Pages / 609.449 x 790.866 pts Page_size
  • 79 Downloads / 193 Views

DOWNLOAD

REPORT


An introduction to effective use of enrichment analysis software Hannah Tipney* and Lawrence Hunter Center for Computational Pharmacology, University of Colorado Denver, Aurora, CO 80045, USA *Correspondence to: Tel: þ1 303 724 3369; E-mail: [email protected] Date received (in revised form): 28th January, 2010

Abstract In recent years, there has been an explosion in the range of software available for annotation enrichment analysis. Three classes of enrichment algorithms and their associated software implementations are introduced here. Their limitations and caveats are discussed, and direction for tool selection is given. Keywords: software, enrichment, gene enrichment analysis, GSEA, pathway analysis, gene ontology, gene list

What is enrichment analysis and why is it useful? The final stage of many proteomic, genetic or metabolic analyses is the production of a list of ‘interesting’ biomolecules. Prominent examples of these include lists of genes ranked by differential or co-expression investigated in microarray experiments, lists of single nucleotide polymorphism (SNP)-containing genes ranked by p-values determined by genetic association to a phenotype of interest through a genome-wide association study, and computationally generated lists of putative transcription factor or miRNA targets ordered by probability. Unfortunately, such ranked lists tend to be devoid of structure and lacking in context. It is difficult to determine how, or even if, the genes and their protein products interact with each other or influence the biological processes under study, or even what their ‘normal’ behaviour might be, by just reviewing them. Extensive exploration of literature and databases is required to answer even rudimentary questions such as: ‘What does this gene and its protein product do? How and where does it do it? Does it make sense to see it on this list? Does it interact with other genes/proteins? Does its behaviour change during disease, disorder or

202

therapy?’ Manual gene-by-gene searches, especially across large lists of genes, are overwhelming and frequently unachievable tasks. Equally, ranked lists of genes do little to replicate the intricate reality of biology, where genes and proteins work together in complex interacting groups to create functioning systems. Focusing on a collection of interesting genes or proteins as a whole is not only more biologically intuitive, but also tends to increase statistical power and reduce dimensionality. Understanding the functional significance of such lists of genes, although overwhelming, is therefore a critical task. Annotation enrichment (sometimes called pathway analysis1) has become the go-to secondary analysis undertaken on collections of genes identified by high-throughput genomic methods owing to its ability to provide valuable insight into the collective biological function underlying a list of genes. By systematically mapping genes and proteins to their associated biological annotations (such as gene ontology [GO] terms2 or pathway membership) and th