Predicting Long non-coding RNAs through feature ensemble learning

  • PDF / 1,368,236 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 5 Downloads / 220 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Predicting Long non-coding RNAs through feature ensemble learning Yanzhen Xu†, Xiaohan Zhao†, Shuai Liu and Wen Zhang* From 2019 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2019) San Diego, CA, USA. 18-21 November 2019

Abstract Background: Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand. Results: In this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction. Conclusions: LncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features. Keywords: lncRNA prediction, Attention mechanism, Feature ensemble learning

Background In the last few decades, due to the development of highthroughput sequencing technologies, a great number of transcripts have been generated [1]. Transcripts are a combination of DNA translation products, including mRNAs, tRNAs, rRNAs, and non-coding RNAs (ncRNAs). NcRNAs are a class of RNAs that do not * Correspondence: [email protected] † Yanzhen Xu and Xiaohan Zhao contributed equally to this work. College of Informatics, Huazhong Agricultural University, Wuhan 430070, China

encode any protein, and lncRNAs (long non-coding RNAs) are ncRNAs with lengths exceeding 200 nucleotides (nt). Although lncRNAs are not translated into proteins, they are of great significance in various cellular development progresses, such as gene expression/regulation [2], gene silencing [3], RNA modification [4]. More importantly, lncRNAs have been proved to be associated with many diseases, for instance, DD3 is related to prostate cancer [5] and BACE1-AS is related to Alzheimer’s disease [6]. Predicting lncRN