VEGA: visual comparison of phylogenetic trees for evolutionary genome analysis (ChinaVis 2019)

  • PDF / 1,362,583 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 30 Downloads / 148 Views

DOWNLOAD

REPORT


R E G UL A R P A P E R

• Yonghua Lu • Kecheng Lu Tong Ge Oliver Deussen • Baoquan Chen



Yunhai Wang



Xin Liu



Zhanglin Cheng



Yi Chen



VEGA: visual comparison of phylogenetic trees for evolutionary genome analysis (ChinaVis 2019)

Received: 5 July 2019 / Revised: 10 August 2019 / Accepted: 4 February 2020 Ó The Visualization Society of Japan 2020

Abstract In the field of evolutionary genome analysis, biologists seek to identify important genes or chromosome regions by comparing phylogenetic trees and analyzing the mutation at which locus might affect phenotypic traits. Unfortunately, the tree comparison and accompanying analysis are often performed manually. In this paper, we characterize the workflow of evolutionary genome analysis and present a task analysis for the fundamental questions asked by biologists during the analysis procedure. We propose two algorithms to enable quantitative tree comparison. One is to measure the differences between corresponding leaf nodes on two trees, and the other is to compute the classification inconsistency of each leaf node by comparing tree structure with a given biological classification. Configuring with the obtained difference and inconsistency, we present a visual analysis system, visual comparison of phylogenetic trees for evolutionary genome analysis, which not only enables biologists to intuitively explore trees but also identify locus which affects their traits by comparing SNP variants of selected leaf nodes. We conclude with case studies from two biologists who used our system to augment their previous manual analysis workflow and demonstrate that our system can reveal more insight. Keywords Visual analysis  System  Genome  Phylogenetic tree

Tong Ge and Yonghua Lu assert equal contribution and joint first authorship. T. Ge (&)  K. Lu  Y. Wang  B. Chen Shandong University, Jinan, China E-mail: [email protected] Y. Lu Shenzhen Investigation and Research Institute Co., Ltd, Shenzhen, China X. Liu BGI, Shenzhen, China Z. Cheng SIAT, Shenzhen, China Y. Chen (&) Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing, China E-mail: [email protected] O. Deussen University of Konstanz, Konstanz, Germany

T. Ge et al.

1 Introduction The rapid development of high-throughput sequencing technologies enables whole-genome sequencing at an unprecedented rate applied to study most of the different organisms. With the obtained whole-genome sequence data, the biologists seek to investigate the genome-wide variation patterns of one species. More specifically, they want to identify the genes which might affect the evolutionary history or response for the significant phenotypic traits changed during the evolution, such as yield, color, size, and others. This identification is facilitated by comparing genomic variation within varieties. Once these genes are identified, they might be used as the basis for future genomic-enabled breeding (Rubin et al. 2010; Li and Zhang 2013) or diagnosing (Xun et al. 2