Pattern Matching in RNA Structures

RNA plays key roles in many biological processes, and its function depends largely on its three-dimensional structure. We describe a comparative approach to learning biologically important RNA structures, including those that are not the predicted minimum

  • PDF / 2,006,337 Bytes
  • 14 Pages / 430 x 660 pts Page_size
  • 1 Downloads / 205 Views

DOWNLOAD

REPORT


Abstract. RNA plays key roles in many biological processes, and its function depends largely on its three-dimensional structure. We describe a comparative approach to learning biologically important RNA structures, including those that are not the predicted minimum free energy (MFE) structure. Our approach identifies the greatest conserved structure(s) in a set of RNA sequences, even in the presence of sequences that have no conserved features. We convert RNA structures to a graph representation (XIOS RNA graph) that includes pseudoknots, and mutually exclusive structures, thereby simultaneously representing ensembles of RNA structures. By modifying existing algorithms for maximal subgraph isomorphism, we can identify the similar portions of the graphs and integrate this with MFE structure prediction tools to identify biologically relevant near-MFE conserved structures.

1

Introduction

RNA molecules perform a variety of important biological functions in addition to carrying information from the chromosome to the ribosome, or acting as structural scaffolds. Catalytic RNAs play key roles in translation, RNA processing and splicing, and gene regulation [1]. Motifs that are important for RNA function are structural and correspond to base-paired regions of secondary structure, which in turn, provide the scaffold for the three-dimensional fold of the RNA [2, 3]. RNA sequences that have the same structural motifs may have sequences that are impossible to align because they have no detectable sequence similarity. While programs that predict RNA secondary structure have been available since the 1980s, RNA structure prediction is handicapped by both biochemical and computational limitations. Firstly, RNA exists as an ensemble of rapidly interconverting structures. Protein structures (usually) show relatively minor fluctuations from a single minimum free-energy state. The case is much different for RNA where there are usually many structures with similar free-energies; these structures may be distinctly different in terms of base-pairing [4, 5]. Secondly, while we know that pseudoknot *

Corresponding author. Tel.: +1 7654946933; Fax: +1 765-496-1189.

I. Măndoiu, R. Sunderraman, and A. Zelikovsky (Eds.): ISBRA 2008, LNBI 4983, pp. 317–330, 2008. © Springer-Verlag Berlin Heidelberg 2008

318

K. Li et al.

structures are very important in RNA structure and catalytic function [6], it remains difficult to reliably predict pseudoknotted structures. This is due both to our incomplete understanding of the energetics of pseudoknot formation, as well as to the computational time complexity. The most efficient pseudoknot prediction algorithms, e.g., pknotRG, have O(n4) time for certain classes of RNAs[7]), but achieve this by placing significant limitations on which structures can be found. Memory complexity of RNA structure prediction is O(n2), where n is the length of the RNA sequence, and usually ranges from 10,000-100,000 bases for primary RNA transcripts. In biology, functionally important features can often be recognized bec