Persistence codebooks for topological data analysis
- PDF / 2,788,109 Bytes
- 41 Pages / 439.37 x 666.142 pts Page_size
- 48 Downloads / 196 Views
Persistence codebooks for topological data analysis Bartosz Zieliński1 · Michał Lipiński1 · Mateusz Juda1 · Matthias Zeppelzauer2 · Paweł Dłotko3
© The Author(s) 2020
Abstract Persistent homology is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representation of PDs that adapts to the inherent sparsity of persistence diagrams. To this end, we adapt bag-of-words, vectors of locally aggregated descriptors and Fischer vectors for the quantization of PDs. Persistence codebooks represent PDs in a convenient way for machine learning and statistical analysis and have a number of favorable practical and theoretical properties including 1-Wasserstein stability. We evaluate the presented representations on several heterogeneous datasets and show their (high) discriminative power. Our approach yields comparable—and partly even higher— performance in much less time than alternative approaches. Keywords Persistent homology · Machine learning · Persistence diagrams · Bag of words · VLAD · Fisher vectors
1 Introduction Topological data analysis (TDA) provides a powerful framework for the structural analysis of high-dimensional data. An important tool in TDA is persistent homology, PH (Edelsbrunner et al. 2002). It provides a comprehensive, multiscale summary of the underlying data’s shape and currently gains an increasing importance in data science (Ferri 2017). Recently, it has been successfully applied to computer vision problems, such as shape and texture analysis (Li et al. 2014; Reininghaus et al. 2015), 3D surface * Bartosz Zieliński [email protected] 1
Institute of Computer Science and Computer Mathematics, Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30‑348 Kraków, Poland
2
Media Computing Group, Institute of Creative Media Technologies, St. Pölten University of Applied Sciences, Matthias Corvinus‑Strasse 15, 3100 St. Pölten, Austria
3
Dioscuri Centre in Topological Data Analysis, Institute of Mathematics, Polish Academy of Sciences, Jana i Jedrzeja Sniadeckich 8, 00‑656 Warsaw, Poland
13
Vol.:(0123456789)
B. Zieliński et al.
analysis (Adams et al. 2017; Zeppelzauer et al. 2017), 3D shape matching (Carrière et al. 2015), mesh segmentation (Skraba et al. 2010), and motion analysis (VejdemoJohansson et al. 2015). Further application areas include time series analysis (Seversky et al. 2016), music tagging (Liu et al. 2016) and social-network analysis (Hofer et al. 2017) as well as applications from the bio-medical domain, e.g. biomolecular analysis (Cang and Wei 2017), brain network analysis (Lee et al. 2012), protein investigation (Gameiro et al. 2015) and material science (Nakamura et al. 2015). Persistent homology can be efficiently
Data Loading...