On the Problem of Reconstructing a Mixture of rna Structures

  • PDF / 1,127,446 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 8 Downloads / 223 Views

DOWNLOAD

REPORT


On the Problem of Reconstructing a Mixture of RNA Structures Torin Greenwood1

· Christine E. Heitsch2

Received: 23 January 2020 / Accepted: 8 September 2020 © Society for Mathematical Biology 2020

Abstract A growing number of rna sequences are now known to exist in some distribution with two or more different stable structures. Recent algorithms attempt to reconstruct such mixtures using the list of nucleotides in a sequence in conjunction with auxiliary experimental footprinting data. In this paper, we demonstrate some challenges which remain in addressing this problem; in particular we consider the difficulty of reconstructing a mixture of two rna structures across a spectrum of different relative abundances. Although progress has been made in identifying the stable structures present, it remains nontrivial to predict the relative abundance of each within the experimentally sampled mixture. Because the ratio of structures present can change depending on experimental conditions, it is the footprinting data—and not the sequence—which must encode information on changes in the relative abundance. Here, we use simulated experimental data to demonstrate that there exist rna sequences and relative abundance combinations which cannot be recovered by current methods. We then prove that this is not a single exception, but rather part of the rule. In particular, we show, using a Nussinov–Jacobson model, that recovering the relative abundances is difficult for a large proportion of rna structure pairs. Lastly, we use information theory to establish a framework for quantifying how useful auxiliary data is in predicting the relative abundance of a structure. Together, these results demonstrate that aspects of the problem of reconstructing a mixture of rna structures from experimental data remain open. Keywords RNA secondary structure · Thermodynamic optimization · Auxiliary data

B

Torin Greenwood [email protected]

1

North Dakota State University, Fargo, USA

2

Georgia Institute of Technology, Atlanta, USA 0123456789().: V,-vol

123

133

Page 2 of 22

T. Greenwood, C. E. Heitsch

1 Introduction Determining the structural conformations of an rna sequence reveals functional information. Identifying structures in the laboratory is difficult, so discrete optimization methods are used instead. These methods use the list of nucleotides in a sequence to predict two-dimensional approximations of structures, called secondary structures, that still encode functional information. However, an increasing number of rna molecules are now known to fold into multiple stable structures (Leonard et al. 2013; Spasic et al. 2018). For example, approximately 20% of eukaryotic rna folds into multiple structural conformations in vivo (Lu et al. 2016). Differing conformations can also yield multiple biological functions for the same rna sequence, as is the case for riboswitches (Antunes et al. 2018). In response, single-structure prediction methods have been refined to find distributions of rna structures, as reviewed in Schroeder (201