Do Long and Highly Conserved Noncoding Sequences in Vertebrates Have Biological Functions?

Vertebrate genomes consist of only a small fraction of protein-coding sequences with vast majority of repetitive and nonrepetitive noncoding sequences. Based on the completion of whole genome sequencing including human, it has become possible to character

  • PDF / 208,430 Bytes
  • 20 Pages / 439.37 x 666.142 pts Page_size
  • 60 Downloads / 148 Views

DOWNLOAD

REPORT


Do Long and Highly Conserved Noncoding Sequences in Vertebrates Have Biological Functions? Yoichi Gondo

Abstract Vertebrate genomes consist of only a small fraction of protein-coding sequences with vast majority of repetitive and nonrepetitive noncoding sequences. Based on the completion of whole genome sequencing including human, it has become possible to characterize the genomic structure directly at the DNA sequence level. With the first approximation of the functional portion of the genome to be highly evolutionary conserved, comparative genomics with bioinformatics and experimental tools are now revealing the details of each element in the genome. In this chapter, recent efforts to extract highly conserved sequences are reviewed with particularly focusing on noncoding and nonrepetitive human and rodent genomes. Strikingly, extracted highly conserved sequences in noncoding sequences exhibit much higher conservation in many vertebrate genomes but not in other invertebrate species than actually functional protein-coding sequences do. Some testable working hypotheses to maintain such highly conserved sequences are also reviewed and discussed.

Abbreviations LINE SINE UTR SNP CNG UCE POLA LCNS

Long interspersed elements Short interspersed elements Untranslated region Single nucleotide polymorphism Conserved non-genic sequence Ultraconserved element DNA polymerase alpha catalytic subunit gene Long conserved noncoding sequence

Y. Gondo Mutagenesis and Genomics Team, RIKEN BioResource Center, 3-1-1 Koyadai, Tsukuba 305-0074, Japan e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_12, # Springer-Verlag Berlin Heidelberg 2010

187

188

Y. Gondo

HNRNPD HNRPDL KO

12.1

Heterogeneous nuclear ribonucleoprotein D Heterogeneous nuclear ribonucleoprotein D-like Knockout

Introduction

Most of higher eukaryotes contain noncoding sequences in the genome. Classically, the DNA reassociation kinetics analyses by using the self-hybridization of fragmented genomic DNA, called Cot curve analysis, experimentally revealed that significant portions of higher eukaryotes encompassed various types of repetitive sequences (e.g., Britten and Kohne 1968; Wetmur and Davidson 1968). The gene-coding sequences were also estimated by various methods including RNA–DNA reassociation kinetics or Rot curve analysis. For instance, the complexity of RNA expression was studied by RNA–DNA association kinetics (Chikaraishi et al. 1978). They found that a unique fraction (31.2%) of rat genomic DNA was found in nuclear RNA of the rat brain and exhibited the highest RNA complexity among various tested rat tissues. Based on the average length of the rat nuclear RNA (4,500 nucleotides) (Bantle and Hahn 1976) and finding that two-thirds (4,500 nucleotides) (1.9 Gb) of the rat genome are unique sequences, Chikaraishi et al. (1978) estimated that the total number of rat gene was 130,000. Based on the spontaneous mutagenesis studies of viability polygenes in D