Protein Structure Alignment Using Maximum Cliques and Local Search

The protein structure alignment problem addresses the question of measuring the degree of similarity, in three–dimensional structure, of two proteins. The representation of each protein using a simple contact map allows the correspondence graph for the pr

  • PDF / 282,193 Bytes
  • 5 Pages / 430 x 660 pts Page_size
  • 96 Downloads / 192 Views

DOWNLOAD

REPORT


Abstract. The protein structure alignment problem addresses the question of measuring the degree of similarity, in three–dimensional structure, of two proteins. The representation of each protein using a simple contact map allows the correspondence graph for the protein pair to be generated and the maximum clique within this graph provides a measure of the structural similarity between the two proteins. This study uses a recently developed maximum clique algorithm, Phased Local Search (PLS), to locate the maximum cliques within correspondence graphs.

1 Introduction In bioinformatics, structural comparison of proteins is useful in several domains. For example, as protein function is intrinsically tied to a protein’s structure [1], identifying structural similarity between a protein and other proteins whose function is known can allow the prediction of that protein’s function. Over the last decade, a number of techniques for structurally comparing proteins have been developed however, none have proved adequate across a range of applications. A relatively new technique, Contact Map Overlap (CMO), first proposed in [2] (and subsequently shown to be NPcomplete [3]), is to identify alignments between protein contact maps with the goal of maximising the number of consistent alignments. A protein consists of a chain of residues (amino acids). When a protein folds into its tertiary (lowest energy) structure, residues that are not directly adjacent in the chain may be physically close in space. The contact map for a protein is a simple representation of this three–dimensional structure and is the matrix of all pairwise distances between the components of the protein. For this study, the components of the protein are identified as the alpha carbon atoms (Cα ) of each amino acid. The contact map can be further simplified into a 0–1 contact map by encoding each pairwise distance as one if the pairwise distance is less than some ˚ and 5.5 A ˚ in this study) and zero othdistance threshold (typically in the range 4 - 8 A erwise. As shown in [4], the CMO problem can be directly translated to a maximum clique (MC) problem which calls for finding the maximum sized sub-graph of pairwise adjacent vertices in a given graph. Formally, the MC problem can be stated as: Given an undirected graph G = (V, E), where V is the set of all vertices and E the set of all edges, find a maximum size clique in G, where a clique K in G is a subset of vertices, K ⊆ V , such that all pairs of vertices in K are connected by an edge, i.e., for M.A. Orgun and J. Thornton (Eds.): AI 2007, LNAI 4830, pp. 776–780, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Protein Structure Alignment Using Maximum Cliques

777

all v, v  ∈ K, {v, v  } ∈ E, and the size of the clique K is the cardinality |K| of K. The maximum clique problem is to maximise |K|, the cardinality of K. MC is NP-hard and the associated decision problem is NP-complete [5]. Therefore, large and hard instances of MC are typically solved using heuristic approaches of which the most recent is P