Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport

  • PDF / 1,023,012 Bytes
  • 53 Pages / 439.37 x 666.142 pts Page_size
  • 36 Downloads / 252 Views

DOWNLOAD

REPORT


Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport Vincent Divol1 · Théo Lacombe1 Received: 5 March 2020 / Accepted: 14 October 2020 © Springer Nature Switzerland AG 2020

Abstract Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g. persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the space of persistence diagrams. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism. Keywords Topological data analysis · Optimal transport · Statistics · Fréchet means Mathematics Subject Classification 62R40 · 49Q22

V. Divol and T. Lacombe contributed equally to this work as first authors.

B

Vincent Divol [email protected] Théo Lacombe [email protected]

1

Inria Saclay, Datashape, 1 Rue Honoré d’Estienne d’Orves, 91120 Palaiseau, France

123

V. Divol, T. Lacombe

1 Introduction 1.1 Framework and motivations Topological Data Analysis (TDA) is an emerging field in data analysis that has found applications in computer vision (Li et al. 2014), material science (Hiraoka et al. 2016; Kramar et al. 2013), shape analysis (Carrière et al. 2015; Turner et al. 2014), to name a few. The aim of TDA is to provide interpretable descriptors of the underlying topology of a given object. One of the most used (and theoretically studied) descriptors in TDA is the persistence diagram. This descriptor consists in a locally finite multiset of points in the upper half plane Ω := {(t1 , t2 ) ∈ R2 , t2 > t1 }, each point in the diagram corresponding informally to the presence of a topological feature (connected component, loop, hole, etc.) appearing at some scale in the filtration of an object X. A complete description of the persistent homology machinery is not necessary for this work and the interested reader can refer to Edelsbrunner and Harer (2010) for an introduction. The sp