A Non-stochastic Method for Clustering of Big Genomic Data

DNA-Microarray technology simultaneously monitors the expression profiles of thousands of genes over various experimental conditions. Identifying co-expressed genes and coherent patterns is the central goal of clustering process, and it is an important ta

PDF / 180,462 Bytes
11 Pages / 439.37 x 666.142 pts Page_size
81 Downloads / 220 Views

DOWNLOAD

REPORT

National Superior Institute of Computer Science (ESI), Algiers, Algeria [email protected] 2 LIRE Laboratory, Constantine 2 University, Constantine, Algeria [email protected]

Abstract. DNA-Microarray technology simultaneously monitors the expression proﬁles of thousands of genes over various experimental conditions. Identifying co-expressed genes and coherent patterns is the central goal of clustering process, and it is an important task in bioinformatics as it helps biologists to gain insights on gene functions, because genes with similar functions exhibit similar expression patterns. Clustering process is used to identify cancer subtypes based on gene expression and DNA methylation datasets, since cancer subtype information is critically important for understanding tumor heterogeneity, detecting previously unknown clusters of biological samples, which are usually associated with unknown types of cancer will, in turn, gives way to prescribe more effective treatments for patients, as cancer varying subtypes often respond disparately to the same treatment. While DNA methylation database is kind of extremely large-scale datasets, running time still remains a major challenge. Actually, almost all the proposed algorithms are stochastic, this characteristic turns out a great issue when it comes to deal with highdimensional biological datasets, since the biologist needs to run the algorithm several times, before taking out the mean of the results obtained in each time, hence they usually require large amounts of computational time. The proposed clustering algorithm is purely non-stochastic, it is able to accurately identify a set of biologically relevant clusters in large-scale DNA datasets, and therefore the biologist needs to run the algorithm just once. Keywords: Bioinformatics Microarray gene expression DNA methylation Stochastic clustering Non-stochastic clustering Running time

1 Background 1.1

Bioinformatics

The application of computer technology to the management of molecular biology is known as bioinformatics. The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular level using computational tools. Starting by storing and mining raw genomic data, and going into analyzing and interpreting relations found within data, then deducing information and discovering

© Springer Nature Switzerland AG 2019 Y. Farhaoui and L. Moussaid (Eds.): ICBDSDE 2018, SBD 53, pp. 75–85, 2019. https://doi.org/10.1007/978-3-030-12048-1_10

76

B. Kenidra and M. Benmohammed

meaningful knowledge thereof, this knowledge is crucial for making the right decision on diagnosis and prognosis, as well as being able to generate new insights and provide a global perspective about the cell, aiming at exploring the genetic relationships of deadly diseases. With the growth of genomic datasets, it has become important to develop techniques being fast and accurate in order to quickly extract meaningful insight that a user can take advantage of. Computational tools

Data Loading...

A Non-stochastic Method for Clustering of Big Genomic Data

Recommend Documents

Big Data and Clustering

A survey on parallel clustering algorithms for Big Data

Development of Security Clustering Process for Big Data in Cloud

Convex clustering method for compositional data modeling

Improving Big Data Clustering for Jamming Detection in Smart Mobility

Big Data Clustering Using MapReduce Framework: A Review

Social mining-based clustering process for big-data integration

Astronomy and Big Data A Data Clustering Approach to Identifying Unc

Big Data A Primer

Comparison of Non-negative Matrix Factorization Methods for Clustering Genomic Data

A Preference Index Design for Big Data

WBTC: a new approach for efficient storage of genomic data