Prediction of RNA secondary structure with pseudoknots using coupled deep neural networks

  • PDF / 645,371 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 24 Downloads / 198 Views

DOWNLOAD

REPORT


Biophysics Reports

METHOD

Prediction of RNA secondary structure with pseudoknots using coupled deep neural networks Kangkun Mao1, Jun Wang1, Yi Xiao1& 1

School of Physics and Key Laboratory of Molecular Biophysics of the Ministry of Education, Huazhong University of Science and Technology, Wuhan 430074, China

Received: 12 June 2020 / Accepted: 3 July 2020 / Published online: 4 August 2020

Abstract

Noncoding RNAs play important roles in cell and their secondary structures are vital for understanding their tertiary structures and functions. Many prediction methods of RNA secondary structures have been proposed but it is still challenging to reach high accuracy, especially for those with pseudoknots. Here we present a coupled deep learning model, called 2dRNA, to predict RNA secondary structure. It combines two famous neural network architectures bidirectional LSTM and U-net and only needs the sequence of a target RNA as input. Benchmark shows that our method can achieve state-of-the-art performance compared to current methods on a testing dataset. Our analysis also shows that 2dRNA can learn structural information from similar RNA sequences without aligning them.

Keywords RNA secondary structure prediction, Deep learning, Minimum free energy

INTRODUCTION RNAs participate in many important biological activities (Xiyuan et al. 2017; Zhao et al. 2016). To do these, they need to form correct tertiary structures in general. Therefore, it is necessary to know the tertiary structures of RNAs to understand their functions. At present, experimental determination of RNA tertiary structures are more difficult than proteins and so many theoretical or computational methods have been proposed to predict RNA tertiary structures (Cao and Chen 2011; Das et al. 2010; Jain and Schlick 2017; Wang et al. 2017; Wang and Xiao 2017; Xu et al. 2014). Although these methods use different principles, their performances all depend on the accuracy of RNA secondary structures. Therefore, accurate prediction of RNA secondary structure is very important.

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s41048-020-00114-x) contains supplementary material, which is available to authorized users.

Traditional methods of RNA secondary structure prediction can be divided mainly into two categories: single-sequence methods and homologous sequences methods. The single-sequence methods only need the sequence of target RNA as input and most of them are based on thermodynamic model or minimum free energy principle (Bellaousov et al. 2013; Janssen and Giegerich 2015; Zuker 2003). Therefore, the accuracy of prediction results of these methods depends largely on the thermodynamic parameters that are difficult to determine accurately (Zhao et al. 2018). Comparing to the single-sequence method, homologous sequence method uses the evolution information of homologous sequences to infer their common secondary structure (Lorenz et al. 2011), e.g., TurboFold (Tan et al. 2017). It was shown that the homologous sequences