Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains

  • PDF / 8,208,874 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 73 Downloads / 200 Views

DOWNLOAD

REPORT


(2020) 15:19 Miladi et al. Algorithms Mol Biol https://doi.org/10.1186/s13015-020-00179-w

Open Access

RESEARCH

Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains Milad Miladi1  , Martin Raden1, Sebastian Will2,3 and Rolf Backofen1,4* 

Abstract  Motivation:  Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O(n6 ) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results:  Here, we introduce a novel variant of Sankoff’s algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Conclusions:  Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff’s algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. Keywords:  RNA secondary structure, Alignment and folding of RNAs, Structural bioinformatics Background In all forms of life, RNAs play essential roles that go beyond coding as messenger RNAs for the synthesis of proteins. Non-coding RNAs (ncRNAs) directly regulate cellular mechanisms, where some are known to be conserved for billions of years [1]. ncRNAs often have only weak sequence conservation, since their (conserved) structure crucially determines their function. Therefore, inferring the conserved secondary structure of homologs—most often, based on RNA alignments, is *Correspondence: [email protected]‑freiburg.de 4 Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, Freiburg, Germany Full list of author information is available at the end of the article

central for the discovery and annotation of functional RNAs. RNA structural alignment algorithms can be classified depending on whether they fold and align simultaneously or in turn [2]. The gold standard for computing reliable alignments (and common structures) of RNAs is still the simultaneous algorithm proposed by Sankoff in 1985 [3]. By simultaneously aligning and folding the RNAs, it resolves the vicious cycle that reliable RNA alignments mu