Databases

  • PDF / 78,677 Bytes
  • 5 Pages / 539 x 751 pts Page_size
  • 21 Downloads / 244 Views

DOWNLOAD

REPORT


CHAPTER 1.2 sesaba t aD

Databases WOLFGANG LUDWIG, KARL-HEINZ SCHLEIFER AND ERKO STACKEBRANDT

Introduction Computerization of microbiological data was first introduced by Sneath (1957) to handle the enormous amount of phenetic data collected to analyze the numerical phenetic taxonomy (NT) of the genus Chromobacterium (105 characters of 45 operational taxonomic units [OTUs]). This work paralleled the work of Sokal and Michener (1958), who used electric tabulating machinery and an electric mechanical desk calculator to generate a classification of bees of the Hoplitis complex (122 characters of 97 species). Sokal and Sneath joined forces to develop the Principles of Numerical Taxonomy (Sokal and Sneath, 1963) and they, together with Florek et al. (1951), Cain and Harrison (1958), and Rogers and Tanimoto (1960), were the first to develop and apply clustering methods (such as single and averagelinkage clustering), probabilistic distance coefficients in NT, Jaccard’s coefficient, scaling of multistate characters, parallelism and convergence, and equal weighting. Many of these algorithms and their modifications are still used today to analyze DNA and RNA electrophoretic patterns (Riboprint, denaturing gradient gel electrophoresis [DGGE], thermal gradient gel electrophoresis [TGGE], amplified fragment length polymorphism [AFLP], restriction fragment length polymorphism [RFLP], and the like), protein patterns, fatty acid methyl ester patterns, and evaluation of ecological parameters, to name a few. As mentioned by Sokal (1985), most larger universities acquired their first computer in 1956, but it took another 15–20 years before PCs were provided to biologists. The present easy accessibility of public databases of biological resources and their sequences of nucleic acids and proteins let us forget how cumbersome comparative sequence analysis was 30 years ago. Similarities were determined from sequences, scattered in the literature, which had to be searched for, copied, and aligned by hand. By the late 1970s, dozens of short oligonucleotides that constituted a 16S rRNA catalogue (Uchida et al., 1974; Fox et al., 1977) were com-

pared using a simple average linkage cluster analysis. For those of us who had no access to a personal computer, the calculation of similarities was done on paper. Nevertheless, as long as the fragments to be compared were short and the number of organisms analyzed were small (