Web-based resources for comparative genomics

  • PDF / 86,757 Bytes
  • 4 Pages / 609 x 791 pts Page_size
  • 78 Downloads / 194 Views

DOWNLOAD

REPORT


Web-based resources for comparative genomics Xun Gu1* and Zhixi Su1,2 1

Department of Genetics, Development and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA 2 James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China *Correspondence to: Tel: þ1 515 294 8075; Fax: þ1 515 294 8457; E-mail: [email protected] Date received (in revised form): 19th May 2005

Abstract The available web-based genome data and related resources provide great opportunities for biomedical scientists to identify functional elements in a particular genome region or to explore the evolutionary pattern of genome dynamics. Comparative genomics is an indispensable tool for achieving these goals. Because of the broad scope of comparative genomics, it is difficult to address all of its aspects in this short survey. A few currently ‘hot’ topics have therefore been selected and a brief review of the availability of web-based databases and software is given. Keywords: comparative genomics, software, web-based database

Genome databases for comparative genomics

Multi-genome alignment and gene prediction

Usually, genome-wide databases (see Table 1) change rapidly, both in their internal implementation and in the datasets recorded. This paper briefly reviews two severs recently made public, which researchers should find valuable for obtaining a wealth of useful information. The genome alignment and annotation database (GALA)1 provides access to information on genes (known and predicted), gene ontology, expression patterns, genome alignments and conserved transcription factor binding sites predicted by the TRANSFAC weight matrix that can be estimated from the known binding sites to show the sequence signature.2 For example, given a set of genes expressed in a particular tissue, GALA is able to identify all of the predicted binding sites for one or more transcription factors of interest that are all conserved in mammals. EnsMart is a branch of the Ensembl project,3 which integrates data from Ensembl and several other resources, using a ‘warehouse star-schema’ with central biological objects (eg genes or single nucleotide polymorphisms) connected to a set of satellite tables, such as disease, transcript and protein family (PFAM) attributes. Thus, EnsMart provides users with fast and effective access to deep data in and around genes.

Genome-wide alignment servers for two closely related species are available on the web. The BLAST,4,5 implemented at the National Center for Biotechnology Information (NCBI), is the most frequently used suite of tools. Several servers were specially designed to align two or more long genomic sequences at high sensitivity while detecting common rearrangements or duplications — for example, PipMaker,6 MultiPipMaker,7 zPicture,8 VISTA9 and MAVID.10 These servers are suitable for species such as those from different mammalian orders. Several pipelines have been designed for mammalian genome alignment.11 – 13 For more distant species, or a