El-MAVEN: A Fast, Robust, and User-Friendly Mass Spectrometry Data Processing Engine for Metabolomics

Analysis of large metabolomic datasets is becoming commonplace with the increased realization of the role that metabolites play in biology and pathophysiology. While there are many open-source analysis tools to extract peaks from liquid chromatography-mas

  • PDF / 1,148,330 Bytes
  • 21 Pages / 504.567 x 720 pts Page_size
  • 53 Downloads / 201 Views

DOWNLOAD

REPORT


on Metabolomics is an important data type that augments genomic, proteomic, and other -omic datasets to provide increased understanding of biology and disease at a systems level [1–8]. Liquid chromatography-mass spectrometry (LC-MS) and gas chromatography-­ mass spectrometry (GC-MS) are techniques commonly used in metabolomic studies to identify and quantify metabolites important in different biological, chemical, or disease conditions. Many such studies involve a large number of samples probed under various experimental conditions resulting in large datasets that cannot be reliably analyzed or quantified using tedious Angelo D’Alessandro (ed.), High-Throughput Metabolomics: Methods and Protocols, Methods in Molecular Biology, vol. 1978, https://doi.org/10.1007/978-1-4939-9236-2_19, © Springer Science+Business Media, LLC, part of Springer Nature 2019

301

302

Shubhra Agrawal et al.

manual processes. The volume and size of data generated using these techniques necessitates the use of robust, stable, and accurate analysis software that enables scientist to explore available data in an interactive, efficient, and reliable manner. Access to such analysis software accelerates the scientific process of measurement, analysis, and hypothesis generation. Due to the need for automated analysis techniques for LC-MS and GC-MS data, multiple data analysis software platforms have been developed, including open-source software like MAVEN [9, 10], XCMS [11–13], MZmine [14, 15], or proprietary software such as SCIEX MultiQuant among others. While open-source software platforms facilitate mass spectrometry data analysis, there are well-documented issues with these tools, such as a large number of false positives and inaccuracies in peak detection [16–18]. Because of such inaccuracies, data analysts often have to review results and adjust analysis parameters, to ensure accurate results. However, most of the available tools have poor user-interface capabilities, with little or no interactive visualizations, thus requiring many rounds of analysis, exporting, viewing of results using separate software, and reanalysis to obtain accurate peak results [9]. This process becomes tedious, forcing some users to create ad hoc pipelines to achieve easier workflows [19]. Additionally, many available software platforms become slow and unreliable when used on high throughput datasets containing a large number of samples or metabolites. In certain cases, there is also a lack of an active support community and detailed documentation. These factors introduce delays and challenges when trying to gain biological insight from large -omics raw datasets. Proprietary software platforms can become cost-prohibitive to use, are restricted to vendor-­ specific file formats, and have the same data analysis and performance challenges when used with large datasets. Here we describe El-MAVEN, an open-source, vendor-neutral software platform that allows interactive, fast, efficient, and reliable analysis of LC-MS, GC-MS, and LC-MS/MS datasets in just four steps fr