Bioinformatics methods for identifying candidate disease genes

  • PDF / 190,309 Bytes
  • 4 Pages / 606.387 x 787.805 pts Page_size
  • 97 Downloads / 203 Views

DOWNLOAD

REPORT


Bioinformatics methods for identifying candidate disease genes Marc A. van Driel1and Han G. Brunner 2* 1

Molecular Biology Department, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands 2 Department of Human Genetics, University Medical Centre Nijmegen, Geert Grooteplein 10, Nijmegen, The Netherlands * Correspondence to: Tel: þ 31 24 3614017; Fax: þ 31 24 3668752; E-mail: [email protected] Date received (in revised form): 28th December 2005

Abstract

With the explosion in genomic and functional genomics information, methods for disease gene identification are rapidly evolving. Databases are now essential to the process of selecting candidate disease genes. Combining positional information with disease characteristics and functional information is the usual strategy by which candidate disease genes are selected. Enrichment for candidate disease genes, however, depends on the skills of the operating researcher. Over the past few years, a number of bioinformatics methods that enrich for the most likely candidate disease genes have been developed. Such in silico prioritisation methods may further improve by completion of datasets, by development of standardised ontologies across databases and species and, ultimately, by the integration of different strategies. Keywords: bioinformatics, candidate disease gene prediction

Introduction Currently, with the increase in accessible data and the development of novel molecular biology techniques, new methods for the identification of disease genes are evolving. Linkage studies and mutation screening are becoming easier and the number of identified (disease) genes is increasing rapidly. 2003 saw the completion of the human genome sequence and the number of genes is now set to 20,000 – 25,000.1,2 With all the genetics technology in place, identification of disease-related mutations in Mendelian single-gene disorders mainly depends on having the right patients and families. The genetic analysis of complex diseases still remains a difficult task, however, and most genes for multifactorial disease remain to be discovered. Genetic mapping by linkage is a mainstay of human genetics research. While positional information reduces the number of genes that are candidates for causing the disease, this reduction is often not sufficient for rapid disease gene identification. The aim of candidate gene prioritisation methods is to choose those genes for detailed mutation analysis that are most likely to be the cause of the disease. This is especially relevant since positional methods may leave up to 100 different genes as candidates. Hence additional information to be used for prioritisation is essential. Databases have become a core source for today’s gene hunters. Retrieval systems such as the National Center for

Biotechnology Information’s Entrez,3 the Sequence Retrieval System4 and Maarten’s Retrieval System5 provide easy and fast access to a collection of frequently used databases. The main focus of these retrieval systems is to fetch a s