Fuzzy Classification for Gene Expression Data Analysis

Microarray expression studies measure, through a hybridisation process, the levels of genes expressed in biological samples. Knowledge gained from these studies is deemed increasingly important due to its potential of contributing to the understanding of

  • PDF / 143,411 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 24 Downloads / 235 Views

DOWNLOAD

REPORT


School of Engineering and Applied Science, Aston University, U.K. Department of Computer Science and Intelligent Systems, Osaka Prefecture University, Japan

Summary. Microarray expression studies measure, through a hybridisation process, the levels of genes expressed in biological samples. Knowledge gained from these studies is deemed increasingly important due to its potential of contributing to the understanding of fundamental questions in biology and clinical medicine. One important aspect of microarray expression analysis is the classification of the recorded samples which poses many challenges due to the vast number of recorded expression levels compared to the relatively small numbers of analysed samples. In this chapter we show how fuzzy rule-based classification can be applied successfully to analyse gene expression data. The generated classifier consists of an ensemble of fuzzy if-then rules which together provide a reliable and accurate classification of the underlying data. Experimental results on several standard microarray datasets confirm the efficacy of the approach.

8.1 Introduction Microarray expression studies measure, through a hybridisation process, the levels of genes expressed in biological samples. Knowledge gained from these studies is deemed increasingly important due to its potential of contributing to the understanding of fundamental questions in biology and clinical medicine. Microarray experiments can either monitor each gene several times under varying conditions or analyse the genes in a single environment but in different types of tissue. In this chapter we focus on the latter where one important aspect is the classification of the recorded samples. This can be used to either categorise different types of cancerous tissues as in [8] where different types of leukemia are identified, or to distinguish cancerous tissue from normal tissue as done in [2] where tumor and normal colon tissues are analysed. One of the main challenges in classifying gene expression data is that the number of genes is typically much higher than the number of analysed samples. Also is it not clear which genes are important and which can be omitted without reducing the classification performance. Many pattern classification techniques have been employed to analyse microarray data. For example, Golub et al. [8] used a weighted voting scheme, Fort and Lambert-Lacroix [6] employed partial least squares and logistic G. Schaefer et al.: Fuzzy Classification for Gene Expression Data Analysis, Studies in Computational Intelligence (SCI) 94, 209–218 (2008) c Springer-Verlag Berlin Heidelberg 2008 www.springerlink.com 

210

G. Schaefer et al.

regression techniques, whereas Furey et al. [7] applied support vector machines. Dudoit et al. [5] investigated nearest neighbour classifiers, discriminant analysis, classification trees and boosting, while Statnikov et al. [16] explored several support vector machine techniques, nearest neighbour classifiers, neural networks and probabilistic neural networks. In several of these studies it h