Depth-First Search Encoding of RNA Substructures

RNA structural motifs are important in RNA folding process. Traditional index-based and shape-based schemas are useful in modeling RNA secondary structures but ignore the structural discrepancy of individual RNA family member. Further, the in-depth analys

  • PDF / 262,641 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 67 Downloads / 183 Views

DOWNLOAD

REPORT


School of Computer, Electronic and Information, Guangxi University, Nanning 530004, China [email protected], [email protected] 2 State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China [email protected] 3 Advanced Analytics Institute, University of Technology Sydney, P.O. Box 123 Broadway, Ultimo, NSW 2007, Australia [email protected] 4 Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong [email protected] 5 Centre for Quantum Computation and Intelligent Systems, University of Technology Sydney, P.O. Box 123 Broadway, Ultimo, NSW 2007, Australia [email protected]

Abstract. RNA structural motifs are important in RNA folding process. Traditional index-based and shape-based schemas are useful in modeling RNA secondary structures but ignore the structural discrepancy of individual RNA family member. Further, the in-depth analysis of underlying substructure pattern is underdeveloped owing to varied and unnormalized substructures. This prevents us from understanding RNAs functions. This article proposes a DFS (depth-first search) encoding for RNA substructures. The results show that our methods are useful in modelling complex RNA secondary structures. Keywords: Data mining

 RNA  Subgraph  Substructure  Support

1 Introduction Most studied sequences in human genome are protein-coding genes. In recent years, there has been increasing evidence to indicate that the non-coding portion of the genome is of crucial functional importance: for normal development and physiology and for disease. For example, microRNAs (miRNAs) have been uncovered as key regulators of gene expression at the post-transcriptional level, and epigenetic and genetic defects in miRNAs and their processing machinery are a common hallmark of disease [1]. ncRNAs are emerging as key regulators of embryogenesis by controlling embryonic gene expression. © Springer International Publishing Switzerland 2016 D.-S. Huang et al. (Eds.): ICIC 2016, Part I, LNCS 9771, pp. 328–334, 2016. DOI: 10.1007/978-3-319-42291-6_32

Depth-First Search Encoding of RNA Substructures

329

Recent advent of high-throughput sequencing enabled genome-wide measurements of RNA structure, including de novo structure prediction and comparative structure prediction on the basis of a single sequence and multiple homologous ncRNAs, respectively. Computational structure prediction leads to an increasingly growth of RNA structure data in the last decade, such as fRNAdb [2], Rfam 12.0 [3] for short regulatory ncRNAs, and lncRNAdb for long non-coding RNAs. Identifying and validating regulatory RNA motifs involved in diverse cellular processes from the valuable data is essential to bring about comprehensive understanding of RNA function [4]. Several methods have been developed to normalize RNA structure data. A solution with sublinear running time would require index-based structure modeling [5]. However, widely used index structures like suffix trees or arrays or the FMindex have uns