Unexpected diversity of CRISPR unveils some evolutionary patterns of repeated sequences in Mycobacterium tuberculosis

  • PDF / 1,604,645 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 111 Downloads / 148 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Open Access

Unexpected diversity of CRISPR unveils some evolutionary patterns of repeated sequences in Mycobacterium tuberculosis Guislaine Refrégier1*, Christophe Sola1*

and Christophe Guyeux2

Abstract Background: Diversity of the CRISPR locus of Mycobacterium tuberculosis complex has been studied since 1997 for molecular epidemiology purposes. By targeting solely the 43 spacers present in the two first sequenced genomes (H37Rv and BCG), it gave a biased idea of CRISPR diversity and ignored diversity in the neighbouring cas-genes. Results: We set up tailored pipelines to explore the diversity of CRISPR-cas locus in Short Reads. We analyzed data from a representative set of 198 clinical isolates as evidenced by well-characterized SNPs. We found a relatively low diversity in terms of spacers: we recovered only the 68 spacers that had been described in 2000. We found no partial or global inversions in the sequences, letting always the Direct Variant Repeats (DVR) in the same order. In contrast, we found an unexpected diversity in the form of: SNPs in spacers and in Direct Repeats, duplications of various length, and insertions at various locations of the IS6110 insertion sequence, as well as blocks of DVR deletions. The diversity was in part specific to lineages. When reconstructing evolutionary steps of the locus, we found no evidence for SNP reversal. DVR deletions were linked to recombination between IS6110 insertions or between Direct Repeats. Conclusion: This work definitively shows that CRISPR locus of M. tuberculosis did not evolve by classical CRISPR adaptation (incorporation of new spacers) since the last most recent common ancestor of virulent lineages. The evolutionary mechanisms that we discovered could be involved in bacterial adaptation but in a way that remains to be identified.

Background Since the rise of molecular biology, repeated sequences (CRISPR, IS, VNTRs) have been used to track relatedness between individuals [1]. Indeed, they share two major features essential for diversity studies: ease of study, and rapid mutation rate [2]. In pathogens like Mycobacterium tuberculosis complex (MTC) they have been used for molecular epidemiology, complementing contact tracing, and/or identifying unsuspected links [1]. In the last 5 years * Correspondence: [email protected]; [email protected] 1 Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, cedex, 91198 Gif-sur-Yvette, France Full list of author information is available at the end of the article

however, popularity of most repeated sequences has decreased first because they are larger than reads provided by Short Reads Sequencing, and second because of the generalization of Whole-Genome-Sequence availability and use of softwares analyzing Single Nucleotide Polymorphisms (SNPs) [3–5]. In fact, some of these repeated sequences have sufficient variation to characterize them based on reads. The boom of Whole Genome Sequencing provides plenty of d