A new FCA-based method for identifying biclusters in gene expression data
- PDF / 2,424,561 Bytes
- 15 Pages / 595.276 x 790.866 pts Page_size
- 58 Downloads / 195 Views
ORIGINAL ARTICLE
A new FCA‑based method for identifying biclusters in gene expression data Amina Houari1 · Wassim Ayadi2 · Sadok Ben Yahia1 Received: 13 July 2015 / Accepted: 26 February 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018
Abstract Biclustering has been very relevant within the field of gene expression data analysis. In fact, its main thrust stands in its ability to identify groups of genes that behave in the same way under a subset of samples (conditions). However, the pioneering algorithms of the literature has shown some limits in terms of the quality of unveiled biclusters. In this paper, we introduce a new algorithm, called BiFCA+, for biclustering microarray data. BiFCA+ heavily relies on the mathematical background of the formal concept analysis, in order to extract the set of biclusters. In addition, the Bond correlation measure is of use to filter out the overlapping biclusters. The extensive experiments, carried out on real-life datasets, shed light on BiFCA+’s ability to identify statistically and biologically significant biclusters. Keywords Biclustering · Formal concept analysis · Data mining · Bioinformatics · DNA microarray data · Bond correlation measure
1 Introduction A biological network is a linked collection of biological entities like genes, proteins, and metabolistes [34]. Analyzing information and extracting biologically relevant knowledge, from these entities, is one of the key issues of bioinformatics. For instance, DNA microarray technologies help to measure the expression levels of thousands of genes under experimental conditions [14]. To do so, gene expression data are arranged in a data matrix. In the latter, rows represent genes, columns represent samples (experimental conditions), and each entry of the matrix denotes the expression level of a gene under a certain experimental condition. In this respect, the discovery of transcriptional modules of genes
* Wassim Ayadi [email protected] Amina Houari [email protected] Sadok Ben Yahia [email protected] 1
Faculty of Sciences of Tunis, University of Tunis El Manar, LIPAH‑LR11ES14, 2092 Tunis, Tunisia
National Higher Engineering School of Tunis, University of Tunis, LaTICE‑LR11ES04, 1008 Tunis, Tunisia
2
that are co-regulated in a set of experiments is of paramount importance [14]. Interestingly enough, the clustering technique has been beneficial in many challenges in bioinformatics. In fact, it allows researchers to gather information such as cancer occurrences, specific tumor subtypes and cancer survival rates [67]. However, the use of clustering algorithms has two major drawbacks. 1. They consider the whole set of samples. This is despite the fact that genes may not be relevant to every sample. Instead, they can be relevant to only a subset of samples, which is a fundamental aspect for numerous problems in the biomedicine field [66]. Thus, clustering should be performed simultaneously on both genes and conditions. 2. Each gene can only be clustered into one group. Never
Data Loading...