Genetic Variation Methods and Protocols

With the continuing advances in sequencing technologies and the availability of thousands of distinct human genomes, we are fast approaching the day when "personal genomes" become a standard study measure and a routine component of personal health records

  • PDF / 1,193,116 Bytes
  • 14 Pages / 504 x 720 pts Page_size
  • 80 Downloads / 320 Views

DOWNLOAD

REPORT


1. Introduction Biology is an information-driven science. This is self evident in the scale of biological data resources built to support genome projects, transcriptomics, whole genome scans etc. The increasing quantity of the available data means, that there are often many challenges in getting the information you want (1). The productionised science approach over the past few decades has provided biological knowledge and technological infrastructure that has dramatically increased the diversity, coverage and often the quality of genomic information. Fortunately, the importance of data standards underpinning this data has been recognised at an early stage (2), and with the range and depth of ontologies being developed by the community, this all helps to bring meaning to data.

Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_3, © Springer Science + Business Media, LLC 2010

39

40

Woollard

Long gone are the days you could maintain key information in spreadsheets and it is increasingly difficult even in a well resourced organisation to maintain comprehensive, integrated data systems of genomic and related data. One of the key reasons for this is that genomic data is rarely static, there are frequent updates, and informatics systems need to integrate updates and diverse data sources together in a meaningful way. We are now increasingly reliant on querying data at the data source sites or key data integration centres e.g. ENSEMBL, UCSC, Mouse Genome Informatics, NCBI (Table 1). This trend is set to continue, indeed the biomedical community is following a similar route to the trailblazers of “big science” – the physics community, by an increased reliance on shared super computer centres (1). 1.1. Tools for Querying Genomic Data

For most scientists, access to genomics data does not require a super computer, it usually means using the web query graphical user interfaces (GUIs) e.g. ENSEMBL and UCSC genome browsers (Table 1). These are well designed and allow you to ask relatively simple questions across impressively comprehensive arrays of data sources. Genome browsers are generally designed to query by a single gene, SNP or genomic region – allowing you to visualise and focus in on the relevant data, such as SNPs, transcripts, promoter regions, etc. If you wish to query with multiple genes or genomic regions, then it is possible to use web applications like BioMart/EnsMart (3) and UCSC Table Browser (4). BioMart does allow you to run

Table 1 Selected list of tools for querying genomic data Tools

URL

Ensembl Genome Browser

http://www.ensembl.org

UCCS Genome Browser

http://genome.ucsc.edu/

NCBI Mapview

http://www.ncbi.nlm.nih.gov/mapview/

Mouse Genome Informatics http://www.informatics.jax.org/genes. (Jackson Labs) shtml Galaxy

http://main.g2.bx.psu.edu/

BioMart/ENSMART

http://www.biomart.org

Taverna

http://taverna.sourceforge.net/

Ensembl API

http://www.ensembl.org/info/docs/ api/index.html

NCBI A