Are spliced ncRNA host genes distinct classes of lncRNAs?

  • PDF / 877,940 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 30 Downloads / 207 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Are spliced ncRNA host genes distinct classes of lncRNAs? Rituparno Sen1 · Jörg Fallmann1 · Maria Emília M. T. Walter2 · Peter F. Stadler1,3,4,5,6,7  Received: 13 September 2020 / Accepted: 10 November 2020 / Published online: 21 November 2020 © The Author(s) 2020

Abstract Many small nucleolar RNAs and many of the hairpin precursors of miRNAs are processed from long non-protein-coding host genes. In contrast to their highly conserved and heavily structured payload, the host genes feature poorly conserved sequences. Nevertheless, there is mounting evidence that the host genes have biological functions beyond their primary task of carrying a ncRNA as payload. So far, no connections between the function of the host genes and the function of their payloads have been reported. Here we investigate whether there is evidence for an association of host gene function or mechanisms with the type of payload. To assess this hypothesis we test whether the miRNA host genes (MIRHGs), snoRNA host genes (SNHGs), and other lncRNA host genes can be distinguished based on sequence and/or structure features unrelated to their payload. A positive answer would imply a functional and mechanistic correlation between host genes and their payload, provided the classification does not depend on the presence and type of the payload. A negative answer would indicate that to the extent that secondary functions are acquired, they are not strongly constrained by the prior, primary function of the payload. We find that the three classes can be distinguished reliably when the classifier is allowed to extract features from the payloads. They become virtually indistinguishable, however, as soon as only sequence and structure of parts of the host gene distal from the snoRNAs or miRNA payload is used for classification. This indicates that the functions of MIRHGs and SNHGs are largely independent of the functions of their payloads. Furthermore, there is no evidence that the MIRHGs and SNHGs form coherent classes of long non-coding RNAs distinguished by features other than their payloads. Keywords  LncRNA · Host gene · MiRNA · SnoRNA · k-mers · Secondary structure · Random forest · Machine learning

Introduction

Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s1206​4-020-00330​-6) contains supplementary material, which is available to authorized users.

A wide variety of molecular and biological functions have been reported for long non-coding RNAs (lncRNAs), recently reviewed, e.g., by Yao et al. (2019). Specific lncRNAs regulate chromosome architecture and chromatin 3



Maria Emília M. T. Walter [email protected]

German Centre for Integrative Biodiversity Research (iDiv) Halle‑Jena‑Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, Leipzig, Germany

4



Peter F. Stadler [email protected]‑leipzig.de

Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, German