Protein Structure Alignment Using Maximum Cliques and Local Search

The protein structure alignment problem addresses the question of measuring the degree of similarity, in three–dimensional structure, of two proteins. The representation of each protein using a simple contact map allows the correspondence graph for the pr

PDF / 282,193 Bytes
5 Pages / 430 x 660 pts Page_size
96 Downloads / 243 Views

DOWNLOAD

REPORT

Abstract. The protein structure alignment problem addresses the question of measuring the degree of similarity, in three–dimensional structure, of two proteins. The representation of each protein using a simple contact map allows the correspondence graph for the protein pair to be generated and the maximum clique within this graph provides a measure of the structural similarity between the two proteins. This study uses a recently developed maximum clique algorithm, Phased Local Search (PLS), to locate the maximum cliques within correspondence graphs.

1 Introduction In bioinformatics, structural comparison of proteins is useful in several domains. For example, as protein function is intrinsically tied to a protein’s structure [1], identifying structural similarity between a protein and other proteins whose function is known can allow the prediction of that protein’s function. Over the last decade, a number of techniques for structurally comparing proteins have been developed however, none have proved adequate across a range of applications. A relatively new technique, Contact Map Overlap (CMO), first proposed in [2] (and subsequently shown to be NPcomplete [3]), is to identify alignments between protein contact maps with the goal of maximising the number of consistent alignments. A protein consists of a chain of residues (amino acids). When a protein folds into its tertiary (lowest energy) structure, residues that are not directly adjacent in the chain may be physically close in space. The contact map for a protein is a simple representation of this three–dimensional structure and is the matrix of all pairwise distances between the components of the protein. For this study, the components of the protein are identified as the alpha carbon atoms (Cα ) of each amino acid. The contact map can be further simplified into a 0–1 contact map by encoding each pairwise distance as one if the pairwise distance is less than some ˚ and 5.5 A ˚ in this study) and zero othdistance threshold (typically in the range 4 - 8 A erwise. As shown in [4], the CMO problem can be directly translated to a maximum clique (MC) problem which calls for finding the maximum sized sub-graph of pairwise adjacent vertices in a given graph. Formally, the MC problem can be stated as: Given an undirected graph G = (V, E), where V is the set of all vertices and E the set of all edges, find a maximum size clique in G, where a clique K in G is a subset of vertices, K ⊆ V , such that all pairs of vertices in K are connected by an edge, i.e., for M.A. Orgun and J. Thornton (Eds.): AI 2007, LNAI 4830, pp. 776–780, 2007. c Springer-Verlag Berlin Heidelberg 2007

Protein Structure Alignment Using Maximum Cliques

777

all v, v ∈ K, {v, v } ∈ E, and the size of the clique K is the cardinality |K| of K. The maximum clique problem is to maximise |K|, the cardinality of K. MC is NP-hard and the associated decision problem is NP-complete [5]. Therefore, large and hard instances of MC are typically solved using heuristic approaches of which the most recent is P

Data Loading...

Protein Structure Alignment Using Maximum Cliques and Local Search

Recommend Documents

Benchmarking Methods of Protein Structure Alignment

Local search for the maximum k -plex problem

Local Search Algorithms for the Maximum Carpool Matching Problem

AligNet: alignment of protein-protein interaction networks

Local Web Search

Distance-guided local search

Protein structure prediction problem: formalization using quaternions

Protein Structure and Modeling

Analysis of Social Networks Using Pseudo Cliques and Averaging

Local Search Algorithms for kSAT

Multiple alignment of transmembrane protein sequences

Local Alignment (with Concave Gap Weights)