Clustering Time Series Gene Expression Data Based on Sum-of-Exponentials Fitting

PDF / 543,038 Bytes
15 Pages / 600 x 792 pts Page_size
26 Downloads / 254 Views

Clustering Time Series Gene Expression Data Based on Sum-of-Exponentials Fitting ˘ Ciprian Doru Giurcaneanu Institute of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland Email: [email protected]

˘ ¸ Ioan Tabus Institute of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland Email: [email protected]

Jaakko Astola Institute of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland Email: [email protected] Received 8 June 2004; Revised 26 October 2004; Recommended for Publication by Xiaodong Wang This paper presents a method based on fitting a sum-of-exponentials model to the nonuniformly sampled data, for clustering the time series of gene expression data. The structure of the model is estimated by using the minimum description length (MDL) principle for nonlinear regression, in a new form, incorporating a normalized maximum-likelihood (NML) model for a subset of the parameters. The performance of the structure estimation method is studied using simulated data, and the superiority of the new selection criterion over earlier criteria is demonstrated. The accuracy of the nonlinear estimates of the model parameters is analyzed with respect to the Cram´er-Rao lower bounds. Clustering examples of gene expression data sets from a developmental biology application are presented, revealing gene grouping into clusters according to functional classes. Keywords and phrases: nonuniformly sampled data, sum-of-exponentials model, normalized maximum likelihood, time series clustering, gene expression data, developmental biology.

1.

INTRODUCTION

The gene expression time profiles are a rich source of information about the dynamics of the underlying genomic network. The experiments are often taken at nonuniform time points, suggested by the biologist’s intuition about the time scale of the important changes in the analyzed biological process, for example, a developmental process or administration of a drug. Clustering the time profiles of the thousands of genes recorded by the microarrays is a very important exploratory problem, for which several methods have been proposed in the past [1, 2, 3]. Most of the existing methods, no matter whatever heuristically motivated, or model-based methods [4] do not make use of the time values at which the measurements have been taken, loosing potentially useful information regarding the analyzed waveforms. Some approaches that take into account the temporal structure in gene expression data are based on hidden Markov model [5], spline approximation [6], or on analysis of temporal variation [7]. In [8], an autoregressive model is used for the gene expression time series, and the

clustering is performed with a Bayesian criterion which measures the similarity between two time series. A comprehensive study on various clustering methods applied to gene expression data that are time series can be found in [9]. A general methodology for modelling the time series collected at no

Data Loading...

Clustering Time Series Gene Expression Data Based on Sum-of-Exponentials Fitting

Recommend Documents

Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks

Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data

Gene Expression Data Matrix

Time Series Data Cleaning Based on Dynamic Speed Constraints

Similarity Study of Hydrological Time Series Based on Data Mining

A hybrid shape-based image clustering using time-series analysis

Approximate Clustering of Time Series Using Compact Model-Based Descriptions

Time Series Data Mining

An autoencoder-based deep learning approach for clustering time series data

Time Series Clustering with Deep Reservoir Computing

Metaheuristics on time series clustering problem: theoretical and empirical evaluation

Time Series Clustering for Knowledge Discovery on Metal Additive Manufacturing