Data Analysis and Bioinformatics

Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognitio

PDF / 1,153,052 Bytes
16 Pages / 430 x 660 pts Page_size
78 Downloads / 337 Views

DOWNLOAD

REPORT

Abstract. Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several ﬁelds such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of diﬀerent types. And this is a typical situation in biology. Some cases studies are also described. Keywords: Clustering, data mining, bio-informatics, Kernel methods, Hidden Markov Models, Multi-Layers Model.

1

Introduction

Bio-informatics is a new discipline devoted to the solution of biological problems, usually on the molecular level, by the use of techniques including applied mathematics, statistics, computer science, and artiﬁcial intelligence. Major research eﬀorts regard sequence alignment [1], gene ﬁnding [2], genome assembly, protein structure alignment [3] and prediction [4], prediction of gene expression, protein-protein interactions, and the modeling of evolution [5]. Mining in structured data is particularly relevant for bio-informatics applications, since the majority of biological data is not kept in databases consisting of a single, ﬂat table [6]. In fact, bio-informatics databases, BDB, are structured and linked objects, connected by relations representing a rich internal structure. Examples of BDB are databases of proteins [7], of small molecules [8], of metabolic and regulatory networks [9]. Moreover, biological data representations are structured and heterogeneous; they consist of large sequences (e.g. 106 gene sequences), 2D large structures (e.g. 105 ∼ 106 spots on DNA chips), 3D structures (e.d. DNA phosphate model, Figure 1a), graphs, networks, expression proﬁles, and phylogenetic trees (Figure 1b). Several issues are dealing with mining biological data, among them there are kernel methods for classiﬁcation of microarray time series data [10]. This classiﬁcation of gene expression time series has many potential applications in medicine and pharmacogenomics, such as disease diagnosis, drug response prediction or disease outcome prognosis, contributing to individualized medical treatment. Graph kernels representations of proteins have been designed to retrieve structure and bio-chemical information and protein function prediction. Feature graphs are considered to represent potential docking sites and retrieve activity maps 3D protein databases. A. Ghosh, R.K. De, and S.K. Pal (Eds.): PReMI 2007, LNCS 4815, pp. 373–388, 2007. c Springer-Verlag Berlin Heidelberg 2007

374

V. Di Ges` u

(a)

(b)

Fig. 1. (a) 3D structure of the DNA phosphate model; (b) an example of phylogenetic tree

Concept of similarity play a relevant role in search both 2D and 3D shape matching in bio-molecular databases. For example, similar 3D shape can be retrieved by using a similarity model based on 3D shape histograms, 3D surface segments, and parametric surface functions including paraboloid and trigonometric polynomials that approx

Data Loading...

Data Analysis and Bioinformatics

Recommend Documents

Bioinformatics Volume I: Data, Sequence Analysis, and Evolution

Data Mining in Bioinformatics

Data Mining in Bioinformatics

Bioinformatics Analysis of DNA Methylation Through Bisulfite Sequencing Data

Bioinformatics for Omics Data Methods and Protocols

Bioinformatics Tools for Proteomics Data Interpretation

Bioinformatics for DNA Sequence Analysis

Statistical and Bioinformatics Analysis of Data from Bulk and Single-Cell RNA Sequencing Experiments

Data Mining and Bioinformatics First International Workshop, VDMB 20

Bioinformatics

Bioinformatics

Bioinformatics