A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm

  • PDF / 1,257,207 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 86 Downloads / 195 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

METHODOLOGIES AND APPLICATION

A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm Biswanath Chowdhury1 • Gautam Garai2

 Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Multiple sequence alignment (MSA) is characterized as a very high computational complex problem. Therefore, MSA problem cannot be solved by exhaustive methods. Nowadays, MSA is being solved by optimizing more than one objective simultaneously. In this paper, we propose a new genetic algorithm based alignment technique, named bi-objective sequence alignment using genetic algorithm (BSAGA). The novelty of this approach is its selection process. One part of the population is selected based on the Sum of Pair, and rest is selected based on Total Conserve Columns. We applied integer-based chromosomal coding to represent only the gap positions in an alignment. Such representation improves the search technique to reach an optimum even for longer sequences. We tested and compared the alignment score of BSAGA with other relevant alignment techniques on BAliBASE and SABmark. The BSAGA shows better performance than others do, which was further proved by the Wilcoxon sign test. Keywords Multiple sequence alignment  Genetic algorithm  Integer coding  Selection  Wilcoxon sign test  Experimental comparison  Bi-objective function

1 Introduction Sequence alignment (SA) is one of the most common and fundamental tasks in bioinformatics for analyzing biological macromolecules like DNAs, RNAs, and proteins. The specific residues of a sequence that play some important functional and structural roles remain conserved through natural selection. Therefore, inferring evolutionary history, functional, and/or structural properties based on residue conservation of a sequence are useful to find other related sequences (Thompson et al. 2011). The process of SA

Communicated by V. Loia.

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00500-020-04917-5) contains supplementary material, which is available to authorized users. & Biswanath Chowdhury [email protected] 1

Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, India

2

Computer Section, Saha Institute of Nuclear Physics, Kolkata, India

arranges two or more sequences in such a way that a maximum number of identical or similar residues are matched or aligned in a column (Mount 2004). Thus, it helps to locate the sites of common portions that share a common evolutionary history. Sometimes, the relative positions of residues within the orthologous sequences are disturbed by some insertion and deletion (indel) of stretches of residues over evolutionary time. This leads to differences in the length of the sequences. An indel event is represented by introducing one or more spaces or gaps inside an alignment. A gap indicates a possible loss or absence of a residue in a sequence with respect to other set of s