Genomic Signals of Reoriented ORFs
- PDF / 709,770 Bytes
- 6 Pages / 600 x 792 pts Page_size
- 60 Downloads / 136 Views
Genomic Signals of Reoriented ORFs Paul Dan Cristea Biomedical Engineering Center, Politehnica University of Bucharest, Splaiul Independentei 313, Bucharest 77206, Romania Email: [email protected] Received 14 March 2003; Revised 12 September 2003 Complex representation of nucleotides is used to convert DNA sequences into complex digital genomic signals. The analysis of the cumulated phase and unwrapped phase of DNA genomic signals reveals large-scale features of eukaryote and prokaryote chromosomes that result from statistical regularities of base and base-pair distributions along DNA strands. By reorienting the chromosome coding regions, a “hidden” linear variation of the cumulated phase has been revealed, along with the conspicuous almost linear variation of the unwrapped phase. A model of chromosome longitudinal structure is inferred on these bases. Keywords and phrases: genomic signals, open reading frames, ORF orientation.
1.
INTRODUCTION
The conversion of nucleotide sequences into digital signals offers the opportunity to apply signal processing methods to analyze genomic information. Using the genomic signal approach, long-range features of DNA sequences, maintained over distances of 106 –108 base pairs, that is, at the scale of whole chromosomes, have been found [1, 2, 3, 4, 5, 6, 7]. One of the most conspicuous results is that the unwrapped phase of the complex genomic signal varies almost linearly along all investigated chromosomes for both prokaryotes and eukaryotes. The slope is specific for various taxa and chromosomes. Such a behavior reveals a large-scale regularity in the distribution of the pairs of successive nucleotides—a rule for the statistics of second order: the difference between the frequency of positive nucleotide-to-nucleotide transitions (A → G, G → C, C → T, T → A) and that of negative transitions (the opposite ones) along a strand of nucleic acid tends to be small, constant, and taxon and chromosome specific. There is a similarity between this rule and Chargaff ’s rules referring to the frequencies of occurrence of nucleotides, that is, to statistics of the first order [8]. The paper shows that the abrupt changes in nucleotide frequencies along DNA strands of prokaryote chromosomes, as revealed by the piecewise linear variation of the cumulated phase of complex genomic signals [1, 2, 3, 4, 5, 6, 7] or by the skew diagrams [9, 10, 11], are the effect of corresponding abrupt changes in the distribution of direct and inverse open reading frames (ORFs) along the strand. It is also shown that, by reorienting all the negative (inverse) ORFs in the direction of the positive (direct) ones, an almost linear variation of the cumulated phase along the concatenated sequence is obtained, corresponding to almost constant frequencies of nucleotides along the entire chain of concatenated reordered ORFs. This large-scale homogeny of the reordered ORFs, to-
gether with the taxon specific large-scale regularities of the actual nucleic DNA strands, suggests that the distribution of direct and inverse codi
Data Loading...