An overview of distance and similarity functions for structured data

PDF / 2,516,402 Bytes
43 Pages / 439.37 x 666.142 pts Page_size
24 Downloads / 231 Views

An overview of distance and similarity functions for structured data Santiago Ontañón1,2

© Springer Nature B.V. 2020

Abstract The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work. Keywords Distance · Similarity · Structured data · Relational learning

1 Introduction The complementary notions of distance and similarity play a key role in many machine learning approaches, such as instance-based learning (Aha et al. 1991), kernel-based methods (Vert et al. 2004), case-based reasoning (Aamodt and Plaza 1994), or clustering algorithms (Ng et al. 2002; Kaufman and Rousseeuw 1987). Distance and similarity functions are also relevant for artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations (Tversky 1977). Specifically this paper presents an overview of distance and similarity functions for structured representations of data, such as graphs or frames. While distance functions for propositional (i.e. feature-vector) representations have been thoroughly studied in the past, work on distance functions for structured representations has been carried out in different communities such as graph matching, inductive logic programming, case-based reasoning, relational learning or graph mining and is much less * Santiago Ontañón [email protected]; [email protected] 1

Google Research, Mountain View, CA 94043, USA

2

Drexel University, Philadelphia, PA 19104, USA

13

Vol.:(0123456789)

S. Ontañón

understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions. Therefore, the goal of this paper is to provide an overview of this work in order to have a complete view of the field of distance functions for structure representations, and lay foundations for future work. Structured data representations are important, since, there are many real-world application domains for which data of interest is inherently structured and it is hard to represent it using a propositional representation. Consider, for example, a biomedical domain where we are in

Data Loading...

An overview of distance and similarity functions for structured data

Recommend Documents

Similarity-Based Correlation Functions for Binary Data

Distance and Similarity Measures

Distance and Similarity Measures

Organic and nano-structured composite photovoltaics: An overview

Structure, Metabolism and Functions of Amino Acids: An Overview

Quantum Algorithms for Similarity Measurement Based on Euclidean Distance

Unsupervised Deep Hashing with Structured Similarity Learning

Learning Distance Functions for Automatic Annotation of Images

Efficient processing of graph similarity queries with edit distance constraints

Similarity-based Data Partitioning

On the family-free DCJ distance and similarity

Semi-Structured Data