Developing an effective biclustering technique using an enhanced proximity measure
- PDF / 3,201,581 Bytes
- 17 Pages / 595.276 x 790.866 pts Page_size
- 97 Downloads / 241 Views
(2020) 9:6
REVIEW ARTICLE
Developing an effective biclustering technique using an enhanced proximity measure Pallabi Patowary1 · Rosy Sarmah1 · Dhruba K. Bhattacharyya1 Received: 30 May 2019 / Revised: 5 December 2019 / Accepted: 7 December 2019 © Springer-Verlag GmbH Austria, part of Springer Nature 2020
Abstract This paper introduces an enhanced version of Pearson’s correlation coefficient (PCC) to achieve better biclustering-enabled co-expression analysis. The modified measure called local pearson correlation measure (LPCM) helps detect shifting, scaling, and shifting-and-scaling correlation patterns effectively over gene expression data in the presence of outlier. An LPCMbased biclustering technique called local correlation-based biclustering technique (LCBT) has also been proposed to identify biclusters of high biological significance. The biclustering results have been established both statistically and biologically using benchmarked gene expression data. Keywords Microarray data · Clustering · Biclustering · P value · GO annotation · Proximity measure
1 Introduction Biclustering is a well-known unsupervised learning technique to analyze gene expression data, towards identification of functionally related set of genes under different subsets of experimental samples or conditions (Cheng and Church 2000). Table 1 presents the basic concept of biclustering. Correlation is a mutual relationship or connection between two or more instances or genes. In terms of bioinformatics, it computes the similarity between a pair of genes across various conditions in gene expression dataset. Gene expression dataset contains the expression profiles of each gene present in a tissue or sample under certain conditions or time points. There is a higher probability of getting a subset of genes which follows the same characteristics across conditions. These characteristics may be observed from the pattern of expression profiles across conditions. Sometimes, though expression values of a set of genes increase additive or productively, still they exhibit similar characteristics * Dhruba K. Bhattacharyya [email protected] Pallabi Patowary [email protected] Rosy Sarmah [email protected] 1
Department of Computer Science and Engineering, Tezpur University, Assam, India
or correlation among them. Thus, the similarity is a measure of the correlations among a set of genes across a set of conditions in the biclusters (Cheng and Church 2000). Identification of sets of biologically significant genes which exhibit different types of correlations is a challenging job. Biclustering algorithm can identify biclusters with the different types of correlations (Ahmed et al. 2014) listed in Table 2. A bicluster following absolute or constant type of correlation is shown in Table 3a. The constant values might be present in a bicluster row-wise or column-wise as shown in Table 3b, c. Biclusters with shifting correlation is shown in Table 3d. Biclusters with scaling correlation is presented in Table 3e, f presents a bicluster exhibiting shi
Data Loading...