Combined clustering models for the analysis of gene expression

PDF / 450,673 Bytes
5 Pages / 612 x 792 pts (letter) Page_size
19 Downloads / 307 Views

ELEMENTARY PARTICLES AND FIELDS Theory

Combined Clustering Models for the Analysis of Gene Expression* M. Angelova** and J. Ellman*** Northumbria University, Newcastle upon Tyne, UK Received April 22, 2009

Abstract—Clustering has become one of the fundamental tools for analyzing gene expression and producing gene classiﬁcations. Clustering models enable ﬁnding patterns of similarity in order to understand gene function, gene regulation, cellular processes and sub-types of cells. The clustering results however have to be combined with sequence data or knowledge about gene functionality in order to make biologically meaningful conclusions. In this work, we explore a new model that integrates gene expression with sequence or text information. DOI: 10.1134/S1063778810020067

1. INTRODUCTION Life sciences are currently undergoing an information revolution as a result of development of techniques and tools that allow the collection of biological information at a high level of detail and large quantities. Microarray technology provides some of the most promising tools available to researchers today as it allows to measure simultaneously the expression levels of thousand of genes under controlled experimental conditions. The ability of this technology to take a snapshot of a whole gene expression pattern opens enormous possibilities. For example, DNA microarrays have been successfully used to study genome-wide patterns of gene expression [1– 4] and are capable of providing fundamental insights into biological processes such as gene function and gene regulation [1, 2], cell cycle [1, 4], and cancer [2, 3]. The motivation for the large-scale gene expression analysis lays with the central dogma of molecular biology [5, 6], which justiﬁes the premise that information about the functional state of an organism is to a great extend determined by the information on the gene expression. One of the most powerful automatic techniques for the analysis of high-throughput gene expression data is clustering [4]. It is the exploratory, unsupervised process of partitioning data into groups (clusters) by ﬁnding similarity patterns within gene expression data. An underlying assumption in clustering is that genes in a cluster are functionally related. This implies that many of the genes could also be coregulated and thus share transcription factor binding ∗

The text was submitted by the authors in English. E-mail: [email protected] *** E-mail: [email protected] **

motifs in their upstream sequences [7]. Clustering results need to be evaluated by biologically signiﬁcant information, such as previously known biological facts, theories and results. Biological and medical literature databases store such published information and can be used to cross-reference experimental and analytical results, and even drive the interpretation and organization of the expression data [8, 9]. In this paper we discuss a combined model that integrates gene expression results with sequence data and published knowledge about gene functionalities in orde

Data Loading...

Combined clustering models for the analysis of gene expression

Recommend Documents

Enhancing gene expression clustering analysis using tangent transformation

Serial Analysis of Gene Expression

Fuzzy Classification for Gene Expression Data Analysis

Fuzzy soft subspace clustering method for gene co-expression network analysis

The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

Mixture models for gene expression experiments with two species

Gene Expression Data Analysis: Classification

Gene Expression Data Analysis: Supervised Analysis

Gene Expression Data Analysis: Unsupervised Analysis

Applications of Emerging Patterns for Microarray Gene Expression Data Analysis

Gene Expression Analysis Methods and Protocols

Clustering of Expression Profiles