The Aldehyde Dehydrogenase Gene Superfamily Resource Center
- PDF / 416,753 Bytes
- 7 Pages / 609.449 x 790.866 pts Page_size
- 113 Downloads / 193 Views
The Aldehyde Dehydrogenase Gene Superfamily Resource Center William Black and Vasilis Vasiliou* Molecular Toxicology and Environmental Health Sciences Program, Department of Pharmaceutical Sciences, University of Colorado Denver, Aurora, CO 80045, USA *Correspondence to: E-mail: [email protected] Date received (in revised form): 23 October 2009
Abstract The website www.aldh.org is a publicly available database for nomenclature and functional and molecular sequence information for members of the aldehyde dehydrogenase (ALDH) gene superfamily for animals, plants, fungi and bacteria. The site has organised gene-specific records. It provides synopses of ALDH gene records, marries trivial terms to correct nomenclature and links global accession identifiers with source data. Server-side alignment software characterises the integrity of each sequence relative to the latest genomic assembly and provides identifier-specific detail reports, including a graphical presentation of the transcript’s exon– intron structure, its size, coding sequence, genomic strand and locus. Also included are a summary of substrates, inhibitors and enzyme kinetics. The site provides reference lists and is designed to facilitate data mining by interested investigators. Keywords: Genomic database, aldehyde dehydrogenase, ALDH, nomenclature, gene superfamily
Introduction The completion of various genome projects and the growing trend towards high-throughput data production have created a significant knowledge base of molecular sequence data across a broad spectrum of species. This increase in available sequence information has led to a widening gap between the available raw sequence data and their functional analyses by molecular biological methods or other genetic approaches. As a consequence, the field of bioinformatics has rapidly developed as an essential aid for data analysis. A number of large-scale, gene-specific databases, including the National Center for Biotechnology Information (NCBI)’s Entrez Gene1 and the European Bioinformatics Institute/Wellcome Trust Sanger Institute’s Ensembl databases,2 have developed to report and catalogue molecular sequence data. The intrinsic format of these databases in attempting to cover all genes for all species or to
136
cover all genes for a given species (eg the mouse genome database3), however, has significant limitations. These include errors in sequence alignments due to a reliance on automated algorithms, poorly defined reference sequences and improper gene nomenclature. Other issues include lack of identification and/or categorisation of alternatively spliced transcriptional variants, as well as erroneous functional characterisations because generalised gene ontology entries do not distinguish the individual gene from other members of its gene superfamily. To address these limitations, we have developed a genespecific database architecture and web-based scripting system which is tailored to report both the molecular sequence and functional data for all members of an individual gene
Data Loading...