Model-based clustering for flow and mass cytometry data with clinical information

  • PDF / 2,314,521 Bytes
  • 15 Pages / 595 x 794 pts Page_size
  • 87 Downloads / 183 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Model-based clustering for flow and mass cytometry data with clinical information Ko Abe1 , Kodai Minoura1,2 , Yuka Maeda3 , Hiroyoshi Nishikawa2,3 and Teppei Shimamura1* From The 18th Asia Pacific Bioinformatics Conference Seoul, Korea. 18–20 August 2020 *Correspondence: [email protected] 1 Division of Systems Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, 4668550 Nagoya, Japan Full list of author information is available at the end of the article

Abstract Background: High-dimensional flow cytometry and mass cytometry allow systemic-level characterization of more than 10 protein profiles at single-cell resolution and provide a much broader landscape in many biological applications, such as disease diagnosis and prediction of clinical outcome. When associating clinical information with cytometry data, traditional approaches require two distinct steps for identification of cell populations and statistical test to determine whether the difference between two population proportions is significant. These two-step approaches can lead to information loss and analysis bias. Results: We propose a novel statistical framework, called LAMBDA (Latent Allocation Model with Bayesian Data Analysis), for simultaneous identification of unknown cell populations and discovery of associations between these populations and clinical information. LAMBDA uses specified probabilistic models designed for modeling the different distribution information for flow or mass cytometry data, respectively. We use a zero-inflated distribution for the mass cytometry data based the characteristics of the data. A simulation study confirms the usefulness of this model by evaluating the accuracy of the estimated parameters. We also demonstrate that LAMBDA can identify associations between cell populations and their clinical outcomes by analyzing real data. LAMBDA is implemented in R and is available from GitHub (https://github.com/ abikoushi/lambda). Keywords: Flow cytomety, Mass cytometory, Bayesian mixture model, Stochastic EM algorithm Background The recent development of high-dimensional flow cytometry and mass cytometry (CyTOF) allows for characterizing cell types and states by detecting the expression levels of pre-defined sets of surface and intracellular proteins at single cell resolution [1]. For an individual subject, the modern flow cytometry data consist of 20 or more protein

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creativ