TRII: A Probabilistic Scoring of Drosophila melanogaster Translation Initiation Sites
- PDF / 1,269,549 Bytes
- 15 Pages / 600.05 x 792 pts Page_size
- 57 Downloads / 170 Views
Research Article TRII: A Probabilistic Scoring of Drosophila melanogaster Translation Initiation Sites Michael P. Weir1 and Michael D. Rice2 1
Department of Biology, Wesleyan University, Middletown, CT 06459, USA of Mathematics and Computer Science, Wesleyan University, Middletown, CT 06459, USA
2 Department
Correspondence should be addressed to Michael P. Weir, [email protected] Received 29 April 2010; Revised 23 August 2010; Accepted 14 October 2010 Academic Editor: Yufei Huang Copyright © 2010 M. P. Weir and M. D. Rice. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Relative individual information is a measurement that scores the quality of DNA- and RNA-binding sites for biological machines. The development of analytical approaches to increase the power of this scoring method will improve its utility in evaluating the functions of motifs. In this study, the scoring method was applied to potential translation initiation sites in Drosophila to compute Translation Relative Individual Information (TRII) scores. The weight matrix at the core of the scoring method was optimized based on high-confidence translation initiation sites identified by using a progressive partitioning approach. Comparing the distributions of TRII scores for sites of interest with those for high-confidence translation initiation sites and random sequences provides a new methodology for assessing the quality of translation initiation sites. The optimized weight matrices can also be used to describe the consensus at translation initiation sites, providing a quantitative measure of preferred and avoided nucleotides at each position.
1. Introduction Understanding how biological machines work in the context of genomes, transcriptomes, and proteomes requires appropriate languages and representations for successful modeling of their biological processes. Information theory provides one of the foundations for this goal and underlies sequence motif-finding algorithms such as MEME [1]. For example, information theory gives us powerful ways to analyze and score sequence motifs in RNAs that are targeted by biological machines such as the spliceosome or ribosome [2–4]. The approach reveals, for each nucleotide position in the motif, which nucleotide choices are preferred and which are avoided. For any single RNA sequence, the collective deviations from the preferred nucleotides must be sufficiently small for the machine to successfully function on that RNA. In this study, several analytical approaches are integrated to increase the power of these scoring methods using Drosophila translation initiation sites as a model setting. As an introduction, we describe first the information theoretic basis for these scoring methods. Motifs of functional
importance can be quantitatively assessed through their sequence conservation, measured as information content in sets of aligned sequences [2, 5,
Data Loading...