Fuzzy soft subspace clustering method for gene co-expression network analysis

  • PDF / 657,809 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 70 Downloads / 168 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Fuzzy soft subspace clustering method for gene co-expression network analysis Qiang Wang1 • Guoliang Chen2

Received: 8 November 2015 / Accepted: 19 December 2015  Springer-Verlag Berlin Heidelberg 2016

Abstract Gene expression clustering methods for building gene co-expression networks suffer greatly from the biological complexity of cells. This paper proposes a fuzzy soft subspace clustering method for detecting overlapped clusters of locally co-expressed genes that may participate in multiple cellular processes and take on different biological functions. Process-specific cluster subspaces and interactions among different gene clusters can be extracted by this method, providing useful information for gene coexpression networks analysis. Experiments on the yeast cell cycle benchmark microarray data have shown that this method is effective in extracting underlying biological relationships between genes, and enhancing gene co-expression network inference. Keywords Bioinformatics  Fuzzy clustering  Subspace clustering  Gene co-expression network  Gene ontology

1 Introduction Gene co-expression networks are helpful for revealing underlying molecular operating mechanisms of a living system [1–3]. This network can be regarded as a modular

& Qiang Wang [email protected] Guoliang Chen [email protected] 1

Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

2

HPC Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

with modules defined on groups of genes involved in a common subcellular process [4, 5]. Since there is evidence that genes sharing similar expression patterns are likely to be involved in the same regulatory process [6, 7], clustering techniques can be used to find co-expressed gene clusters to provide an abstract representation of biological modules, bridging the gap between co-expression and coregulation [8, 9]. However, biological complexities of organisms bring great challenges to gene expression clustering methods. On the one hand, environmental changes may cause a specific subset of co-regulated genes expressed coordinately under an appropriate subset of conditions, so that their products— proteins—will function together to survive the new environment. In this situation, co-expression patterns are usually embedded in different feature subspaces and vary according to the cellular and experimental context, which makes traditional full feature-based and hard subspace clustering methods fail to detect such ‘‘localities’’. On the other hand, when feeding with different regulatory inputs, a gene can play multiple functional roles in various pathways. For instance, each gene is estimated on average to interact with four to eight other genes [10], and to be involved in 10 biological functions [11]. Thus, there is no clear boundary between biologically relevant gene clusters, and a gene should be assigned to multiple clusters possibly corresponding to different