Spectrogram Analysis of Genomes
- PDF / 5,732,243 Bytes
- 14 Pages / 600 x 792 pts Page_size
- 58 Downloads / 201 Views
Spectrogram Analysis of Genomes David Sussillo Department of Electrical Engineering, Columbia University, NY 10027, USA Email: [email protected]
Anshul Kundaje Department of Electrical Engineering, Columbia University, NY 10027, USA Email: [email protected]
Dimitris Anastassiou Department of Electrical Engineering, Center for Computational Biology and Bioinformatics (C2B2) and Columbia Genome Center, Columbia University, NY 10027, USA Email: [email protected] Received 28 February 2003; Revised 22 July 2003 We perform frequency-domain analysis in the genomes of various organisms using tricolor spectrograms, identifying several types of distinct visual patterns characterizing specific DNA regions. We relate patterns and their frequency characteristics to the sequence characteristics of the DNA. At times, the spectrogram patterns can be related to the structure of the corresponding protein region by using various public databases such as GenBank. Some patterns are explained from the biological nature of the corresponding regions, which relate to chromosome structure and protein coding, and some patterns have yet unknown biological significance. We found biologically meaningful patterns, on the scale of millions of base pairs, to a few hundred base pairs. Chromosome-wide patterns include periodicities ranging from 2 to 300. The color of the spectrogram depends on the nucleotide content at specific frequencies, and therefore can be used as a local indicator of CG content and other measures of relative base content. Several smaller-scale patterns are found to represent diļ¬erent types of domains made up of various tandem repeats. Keywords and phrases: DNA spectrograms, frequency-domain analysis, genome analysis.
1.
INTRODUCTION
Color spectrograms of biomolecular sequences were introduced in [1, 2] as visualization tools providing information about the local nature of DNA stretches. These spectrograms give a simultaneous view of the local frequency throughout the nucleotide sequence, as well as the local nucleotide content indicated by the color of the spectrogram. They are helpful not only for the identification of genes and other regions of known biological significance, but also for the discovery of yet unknown regions of potential significance, characterized by distinct visual patterns in the spectrogram that are not easily detectable by character string analysis. Further, they have been found to give global information about whole chromosomes as well. In this paper, we discuss the features and patterns that such spectrograms reveal. We applied a slightly modified version (described below) of the spectrogram development tool introduced in [1, 2] that provides a more direct manifestation of the local relative nucleotide content in the color of the spectrogram, and explored the patterns char-
acteristic in the genomes of various organisms. We created color spectrograms of various frequency bandwidths and sequence lengths. Although the genomes of these organisms vary greatly in size, chromosome numb
Data Loading...