Bagging Support Vector Machine for Classification of SELDI-ToF Mass Spectra of Ovarian Cancer Serum Samples
There has been much progresses recently about the identification of diagnostic proteomic signatures for different human cancers using surface-enhanced laser desorption ionization time-of-flight (SELDI-TOF) mass spectrometry. To identify proteomic patterns
- PDF / 359,394 Bytes
- 7 Pages / 430 x 660 pts Page_size
- 64 Downloads / 184 Views
School of Computer Science and Mathematics Victoria University, VIC 3011, Australia [email protected] 2 School of Information Technology James Cook University, QLD 4811, Australia
Abstract. There has been much progresses recently about the identification of diagnostic proteomic signatures for different human cancers using surface-enhanced laser desorption ionization time-of-flight (SELDITOF) mass spectrometry. To identify proteomic patterns in serum to discriminate cancer patients from normal individuals, many classification methods have been experimented, often with successful results. Most of these earlier studies, however, are based on the direct application of original mass spectra, together with dimension reduction methods like PCA or feature selection methods like T-tests. Because only the peaks of MS data correspond to potential biomarkers, it is important to study classification methods using the detected peaks. This paper investigates ovarian cancer identification from the detected MS peaks by applying Bagging Support Vector Machine as a special strategy of bootstrap aggregating (Bagging). In bagging SVM, each individual SVM is trained independently, using randomly chosen training samples via a bootstrap technique. The trained individual SVMs are aggregated to make a collective decision in an appropriate way, for example, the majority voting. Bagged SVM demonstrated a 94% accuracy with 95% sensitivity and 92% specificity respectively by using the detected peaks. The efficiency can be further improved by applying PCA to reduce the dimension.
1
Introduction
In the last decade, mass spectrometry (MS) based technologies have impressively emerged as the method of primary choice toward the study of proteins, which is the main theme of proteomics as an integral part of the process of understanding biological functions and structures at the protein level. High-throughput proteomics techniques based on mass spectrometry hold great promise for bulk analysis of biological material to accurately identify bacteria. In the realm of clinical proteomics, they have been rapidly adapted to biomedical approaches, for example, cancer prediction on the basis of peptide/protein intensities [1-5, 6-7,13]. The research is very important because earlier detection of cancer is a critical issue for improving patient survival rates. M.A. Orgun and J. Thornton (Eds.): AI 2007, LNAI 4830, pp. 820–826, 2007. c Springer-Verlag Berlin Heidelberg 2007
Bagging Support Vector Machine for Classification
821
The aim of mass spectrometry is to provide information about the amount of all specific molecules in a probe, where each molecule is characterised by its mass. Mass spectrometry measures two properties of ion mixtures in the gas phase under the vacuum environment: the mass/charge ratio (m/z) of ionized proteins in the mixture and the number of ions present at different m/z values. The m/z values and the intensity measurement that indicates an (relative) abundance of the particle are represented on the horizontal axis and vertical axis, r
Data Loading...