A Non-stochastic Method for Clustering of Big Genomic Data
DNA-Microarray technology simultaneously monitors the expression profiles of thousands of genes over various experimental conditions. Identifying co-expressed genes and coherent patterns is the central goal of clustering process, and it is an important ta
- PDF / 180,462 Bytes
- 11 Pages / 439.37 x 666.142 pts Page_size
- 81 Downloads / 212 Views
National Superior Institute of Computer Science (ESI), Algiers, Algeria [email protected] 2 LIRE Laboratory, Constantine 2 University, Constantine, Algeria [email protected]
Abstract. DNA-Microarray technology simultaneously monitors the expression profiles of thousands of genes over various experimental conditions. Identifying co-expressed genes and coherent patterns is the central goal of clustering process, and it is an important task in bioinformatics as it helps biologists to gain insights on gene functions, because genes with similar functions exhibit similar expression patterns. Clustering process is used to identify cancer subtypes based on gene expression and DNA methylation datasets, since cancer subtype information is critically important for understanding tumor heterogeneity, detecting previously unknown clusters of biological samples, which are usually associated with unknown types of cancer will, in turn, gives way to prescribe more effective treatments for patients, as cancer varying subtypes often respond disparately to the same treatment. While DNA methylation database is kind of extremely large-scale datasets, running time still remains a major challenge. Actually, almost all the proposed algorithms are stochastic, this characteristic turns out a great issue when it comes to deal with highdimensional biological datasets, since the biologist needs to run the algorithm several times, before taking out the mean of the results obtained in each time, hence they usually require large amounts of computational time. The proposed clustering algorithm is purely non-stochastic, it is able to accurately identify a set of biologically relevant clusters in large-scale DNA datasets, and therefore the biologist needs to run the algorithm just once. Keywords: Bioinformatics Microarray gene expression DNA methylation Stochastic clustering Non-stochastic clustering Running time
1 Background 1.1
Bioinformatics
The application of computer technology to the management of molecular biology is known as bioinformatics. The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular level using computational tools. Starting by storing and mining raw genomic data, and going into analyzing and interpreting relations found within data, then deducing information and discovering
© Springer Nature Switzerland AG 2019 Y. Farhaoui and L. Moussaid (Eds.): ICBDSDE 2018, SBD 53, pp. 75–85, 2019. https://doi.org/10.1007/978-3-030-12048-1_10
76
B. Kenidra and M. Benmohammed
meaningful knowledge thereof, this knowledge is crucial for making the right decision on diagnosis and prognosis, as well as being able to generate new insights and provide a global perspective about the cell, aiming at exploring the genetic relationships of deadly diseases. With the growth of genomic datasets, it has become important to develop techniques being fast and accurate in order to quickly extract meaningful insight that a user can take advantage of. Computational tools
Data Loading...