RBPsuite: RNA-protein binding sites prediction suite based on deep learning
- PDF / 1,764,488 Bytes
- 8 Pages / 595.276 x 790.866 pts Page_size
- 7 Downloads / 176 Views
SOFTWARE
Open Access
RBPsuite: RNA-protein binding sites prediction suite based on deep learning Xiaoyong Pan1*† , Yi Fang1†, Xianfeng Li2, Yang Yang3 and Hong-Bin Shen1*
Abstract Background: RNA-binding proteins (RBPs) play crucial roles in various biological processes. Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs. However, the training of deep learning models is very time-intensive and computationally intensive. Results: Here we present a deep learning-based RBPsuite, an easy-to-use webserver for predicting RBP binding sites on linear and circular RNAs. For linear RNAs, RBPsuite predicts the RBP binding scores with them using our updated iDeepS. For circular RNAs (circRNAs), RBPsuite predicts the RBP binding scores with them using our developed CRIP. RBPsuite first breaks the input RNA sequence into segments of 101 nucleotides and scores the interaction between the segments and the RBPs. RBPsuite further detects the verified motifs on the binding segments gives the binding scores distribution along the full-length sequence. Conclusions: RBPsuite is an easy-to-use online webserver for predicting RBP binding sites and freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/. Keywords: Deep learning, RNA-binding proteins, Linear RNAs, Circular RNAs
Background RNA-binding proteins (RBPs) are involved in many biological processes, their binding sites on RNAs can give insights into mechanisms behind diseases involving RBPs [1]. Thus, how to identify the RBP binding sites on RNAs is very crucial for follow-up analysis, like the impact of mutations on binding sites. With highthroughput sequencing developing, there is an explosion in the amount of experimentally verified RBP binding sites, e.g. eCLIP [2] in ENCODE [3]. However, these CLIP-seq data still cannot provide the full view of the RBP binding landscape, it is because CLIP-seq relies on gene expression which can be highly variable between experiments. But these big data can serve as training * Correspondence: [email protected]; [email protected] † Xiaoyong Pan and Yi Fang contributed equally to this work. 1 Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China Full list of author information is available at the end of the article
data for machine learning models to predict missing RBP binding sites that may not be detected in some experiments. For example, GraphProt encodes a RNA sequence and structure in a graph [4], which is fed into a support vector machine to classify RBP bound sites from unbound sites. GraphProt can detect the binding sequence and structure preference of RBPs and further predict the RBP binding sites on any input RNAs. Considering that RBPs have difference binding preferences, the machine leaning-based methods train RBP-specific models; each model is trained per RBP. Recently, deep learning-based methods have achieved remarkable
Data Loading...