GCViT: a method for interactive, genome-wide visualization of resequencing and SNP array data

  • PDF / 1,097,087 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 97 Downloads / 217 Views

DOWNLOAD

REPORT


SOFTWARE

Open Access

GCViT: a method for interactive, genomewide visualization of resequencing and SNP array data Andrew P. Wilkey1, Anne V. Brown2, Steven B. Cannon2 and Ethalinda K. S. Cannon2*

Abstract Background: Large genotyping datasets have become commonplace due to efficient, cheap methods for SNP identification. Typical genotyping datasets may have thousands to millions of data points per accession, across tens to thousands of accessions. There is a need for tools to help rapidly explore such datasets, to assess characteristics such as overall differences between accessions and regional anomalies across the genome. Results: We present GCViT (Genotype Comparison Visualization Tool), for visualizing and exploring large genotyping datasets. GCViT can be used to identify introgressions, conserved or divergent genomic regions, pedigrees, and other features for more detailed exploration. The program can be used online or as a local instance for whole genome visualization of resequencing or SNP array data. The program performs comparisons of variants among user-selected accessions to identify allele differences and similarities between accessions and a userselected reference, providing visualizations through histogram, heatmap, or haplotype views. The resulting analyses and images can be exported in various formats. Conclusions: GCViT provides methods for interactively visualizing SNP data on a whole genome scale, and can produce publication-ready figures. It can be used in online or local installations. GCViT enables users to confirm or identify genomics regions of interest associated with particular traits. GCViT is freely available at https://github.com/LegumeFederation/gcvit. The 1.0 version described here is available at https://doi.org/10.5281/zenodo.4008713. Keywords: GCViT, CViT, SNP, Resequencing, Genotype, Visualization, UI, Web service

Background As high throughput genotyping costs have dropped, the dense genotyping of large germplasm collections has become commonplace. Re-sequencing and SNP-array projects are used to identify sequence variants between multiple lines, and may be used to perform genome wide association studies (GWAS) to find variants that are associated with phenotypes. These studies can produce millions of SNPs. For example, Torkamaneh et al. [1] * Correspondence: [email protected] 2 USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA Full list of author information is available at the end of the article

identified 15 million variants among 1007 accessions of soybean, which has relatively low diversity compared with a crop such as maize. Often these data sets are used for a single genome wide association study (GWAS), but such data sets are rich and may be repurposed for other studies. Reuse of this valuable data requires tools for visualization and analysis. Several tools exist for exploring this data. The command line tool Genotype Query Tools (GQT) [2] and its web form, webGQT [3] provide a means of indexing and querying VCF files. However it lacks vis