A Novel Algebraic Structure of the Genetic Code Over the Galois Field of Four DNA Bases

  • PDF / 181,179 Bytes
  • 16 Pages / 595 x 842 pts (A4) Page_size
  • 118 Downloads / 187 Views

DOWNLOAD

REPORT


Research Institute of Tropical Roots, Tuber Crops and Banana (INIVIT), Biotechnology Group, Santo Domingo, Villa Clara, Cuba 2 Center of Studies on Informatics, Central University of Las Villas, Villa Clara, Cuba Mailing address: Robersy S´anchez, Apartado Postal 697, Santa Clara 1, CP 50100, Villa Clara, Cuba E-mails: [email protected]; [email protected] Received 10 February 2005; accepted 22 December 2005

ABSTRACT A novel algebraic structure of the genetic code is proposed. Here, the principal partitions of the genetic code table were obtained as equivalent classes of quotient spaces of the genetic code vector space over the Galois field of the four DNA bases. The new algebraic structure shows strong connections among algebraic relationships, codon assignment and physicochemical properties of amino acids. Moreover, a distance function defined between the codon binary representations in the vector space was demonstrated to have a linear behavior respect to physical variables such as the mean of amino acids interaction energies in proteins. It was also noticed that the distance between wild type and mutant codons approach to smaller values in mutational variants of four genes, i.e., human phenylalanine hydroxylase, human β-globin, HIV-1 protease and HIV-1 reverse transcriptase. These results strongly suggest that deterministic rules must be involved in the genetic code origin.

Key Words: genetic code vector space, genetic code algebra, genetic code Lie algebra, gene mutation

1. INTRODUCTION The nature of genetic code is now fairly well known. From the second half of 20th century, many attempts have been made, just to understand its internal regularity (Bashford and Jarvis, 2000; Bashford and Tsohantjis, 1998; Beland and Allen, 1994; Crick, 1964; Eck, 1963; Epstein, 1966; Jim´enez-Monta˜no, 1966; Jukes, 1977; Volkenshtein, 1985). The Code represents an extension of the four-letter alphabet of deoxyribonucleic (DNA) bases: Adenine (A), Guanine (G), Cytosine (C), and Thymine (T) or Uracil (U) in ribonucleic acid (RNA). As established, chemical pairing by hydrogen bonds occurs between G ≡ C and A = T (U), which means G is complementary base of C and A to T (U) or viceversa. Furthermore, an association between codons having U at second base position and hydrophobicity of the amino acids was also observed, i.e. for amino acids I, L, M, F, V (one-letter symbol of amino acids). Whereas those codons having A at second base position code to hydrophilic or polar amino acids, i.e., D, E, H, N, K, Q, Y (Crick, 1968). Epstein (1966) has stated that amino acids cannot be randomly allocated by just considering the features of the genetic code -fully discussed by Crick (1968)- and particularly we Acta Biotheoretica (2006) 54: 27–42 DOI: 10.1007/s10441-006-6192-9

 C

Springer 2006

´ NCHEZ AND R. GRAU R . SA

28

believe that the order of codons must reflect their physicochemical properties. Anyway, an optimal distribution of the table must be assumed. Gillis et al. have suggested that genetic code can be optimised by limi