Mustguseal and Sister Web-Methods: A Practical Guide to Bioinformatic Analysis of Protein Superfamilies

Bioinformatic analysis of functionally diverse superfamilies can help to study the structure-function relationship in proteins, but represents a methodological challenge. The Mustguseal web-server can build large structure-guided sequence alignments of th

  • PDF / 886,671 Bytes
  • 22 Pages / 504.567 x 720 pts Page_size
  • 63 Downloads / 187 Views

DOWNLOAD

REPORT


Introduction Understanding the relationship between protein sequence/structure and its biological function is one of the most complex problems in modern biology. During evolution of proteins from a common ancestor, one functional property may be preserved, while others may vary as a result of mutations introduced into the protein structure, leading to functional diversity. Comparative analysis of homologs implementing different properties within a common structure of the superfamily can help to understand the relationship between the protein structure, function, and

Kazutaka Katoh (ed.), Multiple Sequence Alignment: Methods and Protocols, Methods in Molecular Biology, vol. 2231, https://doi.org/10.1007/978-1-0716-1036-7_12, © Springer Science+Business Media, LLC, part of Springer Nature 2021

179

180

Dmitry Suplatov et al.

regulation [1], but represent a methodological and computational challenge, as both sequences and structures have to be taken into account to accurately superimpose evolutionarily distantly related proteins [2–4]. Only a handful of tools are available to address this issue, MAFFT-DASH [2] and Mustguseal [3] being the most recent ones. The MAFFT-DASH alignment tool supports a situation where there is no prior information about 3D-structures of homologs. Mustguseal is useful when the analysis of a specific query protein in the context of the corresponding superfamily is of interest (e.g., for the purpose of protein engineering or annotation of novel drug-binding sites). Mustguseal and integrated sister web-servers feature a collection of open-access methods available at https://biokinet. belozersky.msu.ru/m-platform. The aim of this online platform is to provide an easy-to-use comprehensive solution for the systematic bioinformatic analysis of protein superfamilies. The key web-server Mustguseal can automatically collect and align thousands of sequences and structures of proteins within a superfamily to produce a large structure-guided sequence alignment. Four types of bioinformatic algorithms—i.e., for the database search and multiple alignment by sequence and 3D-structure comparison—are implemented to take into account the vast variability of proteins within a superfamily: superimposition of the protein 3D-structures, known to be more conserved among homologs throughout the evolution, is used to match distant relatives, whereas alignment of amino acid sequences is carried out to match close homologs. The Mustguseal protocol is initiated by a query protein 3D-structure (i.e., a member of the corresponding superfamily) and consists of four major steps (Fig. 1). First, the structure similarity search by the SSM algorithm [5] is implemented to collect evolutionarily distantly related proteins that lost sequence similarity during natural selection and specialization from a common ancestor. These representative proteins are expected to introduce different protein families into the alignment. Then, a 3D-superimposition of the collected structures is performed by the MATT algorithm [4, 6] to create th