Building Classification Models from Microarray Data with Tree-Based Classification Algorithms

Building classification models plays an important role in DNA mircroarray data analyses. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification

PDF / 152,710 Bytes
10 Pages / 430 x 660 pts Page_size
39 Downloads / 203 Views

DOWNLOAD

REPORT

Abstract. Building classification models plays an important role in DNA mircroarray data analyses. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. This paper investigates various aspects of building classification models from microarray data with tree-based classification algorithms by using Partial Least-Squares (PLS) regression as a feature selection method. Experimental results show that the Partial Least-Squares (PLS) regression method is an appropriate feature selection method and tree-based ensemble models are capable of delivering high performance classification models for microarray data.

1

Introduction

DNA microarrays measure a large quantity (often in the thousands or even tens of thousands) of gene expressions of several samples simultaneously. The collected data from DNA microarrays are often called microarray data sets. Advancing statistical methods and machine learning techniques have played important roles in analysing microarray data sets. Results from such analyses have been fruitful and have provided powerful tools for studying the mechanism of gene interaction and regulation for oncological and other studies. Among much bioinformatics research concerned with microarray data, two areas have been extensively studied. One is to design algorithms to select a small subset of genes most relevant to the target concept among a large number of genes for further scrutinising. Another popular research topic is to construct eﬀective predictors which are capable of producing highly accurate predictions based on diagnosis or prognosis data. However, due to the nature of the collection of microarray data, a microarray data set usually has a very limited number of samples. In a typical gene expression proﬁle, the number of gene expressions (input variables) is substantially larger than the size of samples. Most standard statistical methods and machine learning algorithms are unable to cope with microarray data because these methods and algorithms require the number of instances in a data set to be larger than the number of input variables. Therefore, many machine learning articles have proposed modiﬁed statistical methods and machine learning algorithms tailored to microarray analyses. As such, many proposed classiﬁcation algorithms M.A. Orgun and J. Thornton (Eds.): AI 2007, LNAI 4830, pp. 589–598, 2007. c Springer-Verlag Berlin Heidelberg 2007

590

P.J. Tan, D.L. Dowe, and T.I. Dix

for microarray data have adopted various hybrid schemes. In these algorithms, the classiﬁcation process usually has two steps, which we now outline. In the ﬁrst step, the original gene expression data is fed into a dimensionality reduction algorithm, which reduces the number of input variables by either ﬁltering out a larger amount of irrelevant input variables or building a small number of linear or nonlinear

Data Loading...

Building Classification Models from Microarray Data with Tree-Based Classification Algorithms

Recommend Documents

Machine Learning Models and Algorithms for Big Data Classification T

Data Mining Classification Models for Industrial Planning

Quantum Classification Algorithms

Optimized gene selection and classification of cancer from microarray gene expression data using deep learning

Empirical Analysis of Classification Algorithms in Data Stream Mining

Algorithms for Image Texture Classification

Text classification algorithms for mining unstructured data: a SWOT analysis

Data Science and Classification

Data Augmentation with Transformers for Text Classification

Binary classification with ambiguous training data

A Survey on Major Classification Algorithms and Comparative Analysis of Few Classification Algorithms on Contact Lenses

Around Classification Theory of Models