Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data
- PDF / 2,407,010 Bytes
- 14 Pages / 595.276 x 790.866 pts Page_size
- 50 Downloads / 147 Views
ORIGINAL ARTICLE
Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data Yun Xue1 • Zhihao Ma1 • Huixin Xu2 • Zhihao Lu3 • Xiaohui Hu1 Chaoyi Pang4
•
Received: 28 February 2015 / Accepted: 19 November 2015 Springer-Verlag Berlin Heidelberg 2015
Abstract The application of high-throughput microarray has led to massive gene expression data, urging effective methodology for analysis. Biclustering comes out and serves as a useful tool, performing simultaneous clustering on rows and columns to find subsets of coherently expressed genes and conditions. Specially, in analysis of time–series gene expression data, it is meaningful to restrict biclusters to contiguous time points concerning coherent evolutions. In this paper, BCCC-Bicluster is proposed as an extension of CCC-Bicluster. An exact algorithm based on frequent sequential mining is proposed to find all maximal BCCC-Biclusters. The newly defined Frequent-Infrequent Tree-Array (FITA) is constructed to speed up the traversal process, with useful strategies originating from Apriori property to avoid redundant work. To make it more efficient, the bitwise operation XOR is applied to capture identical or opposite contiguous patterns between two rows. The algorithm is tested in simulated data, yeast microarray data and human microarray data. The experimental results show the proposed algorithm had better performance on the ability to recover the planted biclusters in the synthetic data than CCC-Biclusters and & Yun Xue [email protected] Xiaohui Hu [email protected] 1
Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication, South China Normal University, Guangdong, China
2
BGI-Shenzhen, Shenzhen, China
3
Computer School, South China Normal University, Guangdong, China
4
NIT, Zhejiang University, Zhejiang, China
outperformed the one without FITA in speed and scalability. In the enrichment analysis, BCCC-Biclusters are proven to find more significant GO terms involved in biological processes than other three kinds of up-to-date biclusters. Keywords Bicluster Time series Gene expression data Coherent evolution Frequent sequential mining Bitwise operation
1 Introduction Recent numerous high-throughput developments in DNA chips have revolutionized the experimental study of gene expression, bringing about massive gene expression results. Such data are represented as a matrix D of real numbers, shown in Fig. 1, with rows representing genes and columns representing different time points, different environmental conditions, different organs or even different individuals. Each element represents the expression level of a gene under a specific condition (time). Matrix D may conceal significant information of biological mechanisms of human beings and creatures, for example the transcriptional network of different organisms and functional regulators of some complex cellular processes. Extracting biologically relevant information from this kind of data is a challenging, cru
Data Loading...