Frequency spectra characterization of noncoding human genomic sequences

  • PDF / 3,441,604 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 8 Downloads / 146 Views

DOWNLOAD

REPORT


Online ISSN 2092-9293 Print ISSN 1976-9571

RESEARCH ARTICLE

Frequency spectra characterization of noncoding human genomic sequences O. Paredes1 · Rebeca Romo‑Vázquez1 · Israel Román‑Godínez1 · Hugo Vélez‑Pérez1 · Ricardo A. Salido‑Ruiz1 · J. Alejandro Morales1  Received: 22 January 2019 / Accepted: 27 April 2020 © The Genetics Society of Korea 2020

Abstract Background  Noncoding sequences have been demonstrated to possess regulatory functions. Its classification is challenging because they do not show well-defined nucleotide patterns that can correlate with their biological functions. Genomic signal processing techniques like Fourier transform have been employed to characterize coding and noncoding sequences. This transformation in a systematic whole-genome noncoding library, such as the ENCODE database, can provide evidence of a periodic behaviour in the noncoding sequences that correlates with their regulatory functions. Objective  The objective of this study was to classify different noncoding regulatory regions through their frequency spectra. Methods  We computed machine learning algorithms to classify the noncoding regulatory sequences frequency spectra. Results  The sequences from different regulatory regions, cell lines, and chromosomes possessed distinct frequency spectra, and that machine learning classifiers (such as those of the support vector machine type) could successfully discriminate among regulatory regions, thus correlating the frequency spectra with their biological functions Conclusion  Our work supports the idea that there are patterns in the noncoding sequences of the genome. Keywords  Human genome · ENCODE · Genomic signal processing · Spectral classification · Noncoding sequence Fourier analysis

Introduction Historically, noncoding regions have not been studied as deeply as coding sequences; this is rather obvious because the product of coding sequences are proteins, whose differential expression gives the cell its properties, such as pluripotency specialization or tissue generation, among others (Alexander et al. 2010). This differential expression, however, is the result of the interaction of several genomic regulatory elements that lie inside the noncoding portions of the genome. Therefore, the study of noncoding regions is fundamental to understand system function (Pennisi 2012; The ENCODE Project Consortium 2012). Gene promoters were the first noncoding regions recognized, followed by enhancers, repressors and many other sequences with specific functions related to gene * J. Alejandro Morales [email protected] 1



Computer Sciences Department, Universidad de Guadalajara, Guadalajara, Mexico

expressions. Gradually, as biological gene regulation knowledge expanded, this zoo of sequences became a muddle of functions too complex to catalogue. The ENCODE project set out to categorize human noncoding regions on the basis of their epigenomic characteristics (The ENCODE Project Consortium 2011). One objective was to understand differential gene expression across speci