Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA pri
- PDF / 1,550,178 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 32 Downloads / 184 Views
RESEARCH
Open Access
Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure Lei Deng1 , Youzhi Liu1 , Yechuan Shi1 , Wenhao Zhang2 , Chun Yang3* and Hui Liu2* From 2019 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2019) San Diego, CA, USA. 18–21 November 2019
Abstract Background: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results: In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions: Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/. Keywords: RNA-binding proteins, Binding sites, Distributed representation, k-mer, Deep learning, Convolutional neural network, Bidirectional long short term memory network
*Correspondence: [email protected]; [email protected] Aliyun School of Big Data, Changzhou University, 213164, Changzhou, China 3 Department of Obstetrics, The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou, China Full list of author information is available at the end of the article 2
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com
Data Loading...