Learning Equations from Biological Data with Limited Time Samples

  • PDF / 2,311,369 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 101 Downloads / 188 Views

DOWNLOAD

REPORT


Learning Equations from Biological Data with Limited Time Samples John T. Nardini, et al. [full author details at the end of the article] Received: 19 May 2020 / Accepted: 16 August 2020 © Society for Mathematical Biology 2020

Abstract Equation learning methods present a promising tool to aid scientists in the modeling process for biological data. Previous equation learning studies have demonstrated that these methods can infer models from rich datasets; however, the performance of these methods in the presence of common challenges from biological data has not been thoroughly explored. We present an equation learning methodology comprised of data denoising, equation learning, model selection and post-processing steps that infers a dynamical systems model from noisy spatiotemporal data. The performance of this methodology is thoroughly investigated in the face of several common challenges presented by biological data, namely, sparse data sampling, large noise levels, and heterogeneity between datasets. We find that this methodology can accurately infer the correct underlying equation and predict unobserved system dynamics from a small number of time samples when the data are sampled over a time interval exhibiting both linear and nonlinear dynamics. Our findings suggest that equation learning methods can be used for model discovery and selection in many areas of biology when an informative dataset is used. We focus on glioblastoma multiforme modeling as a case study in this work to highlight how these results are informative for data-driven modeling-based tumor invasion predictions. Keywords Equation learning · Numerical differentiation · Sparse regression · Model selection · Partial differential equations · Parameter estimation · Population dynamics · Glioblastoma multiforme

This material was based upon work partially supported by the National Science Foundation under Grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute and IOS-1838314 to KBF, and in part by National Institute of Aging Grant R21AG059099 to KBF. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. BM gratefully acknowledges Ph.D. studentship funding from the UK EPSRC (reference EP/N50970X/1). AHD, LC, and KRS gratefully acknowledge funding through the NIH U01CA220378 and the James S. McDonnell Foundation 220020264.

B

John T. Nardini [email protected]

Extended author information available on the last page of the article 0123456789().: V,-vol

123

119

Page 2 of 33

J. T. Nardini et al.

1 Introduction Mathematical models are a crucial tool for inferring the mechanics underlying a scientific system of study (Nardini et al. 2016) or predicting future outcomes (Ferguson et al. 2020). The task of interpreting biological data in particular benefits from mathematical modeling, as models allow biologists to test multiple hypotheses in silico (Ozik et al. 2018), optimally design experiments