A Joint Embedding Method for Entity Alignment of Knowledge Bases

We propose a model which jointly learns the embeddings of multiple knowledge bases (KBs) in a uniform vector space to align entities in KBs. Instead of using content similarity based methods, we think the structure information of KBs is also important for

  • PDF / 1,034,074 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 84 Downloads / 331 Views

DOWNLOAD

REPORT


Abstract. We propose a model which jointly learns the embeddings of multiple knowledge bases (KBs) in a uniform vector space to align entities in KBs. Instead of using content similarity based methods, we think the structure information of KBs is also important for KB alignment. When facing the cross-linguistic or different encoding situation, what we can leverage are only the structure information of two KBs. We utilize seed entity alignments whose embeddings are ensured the same in the joint learning process. We perform experiments on two datasets including a subset of Freebase comprising 15 thousand selected entities, and a dataset we construct from real-world large scale KBs – Freebase and DBpedia. The results show that the proposed approach which only utilize the structure information of KBs also works well. Keywords: Embeddings · Multiple knowledge bases information · Freebase · DBpedia

1

·

Structure

Introduction

As the amount of knowledge bases (KBs) accumulated rapidly on the web, the problem of how to reuse these KBs has gained more and more attention. In the real-world scenarios, many KBs describe the same entities in different ways, because KBs are distributional heterogeneous resources created by different individuals or organizations. For example, president Barack Hussein Obama is denoted by m.02mjmr in Freebase [3], while Barack Obama in DBpedia [2]. Aligning such same entities could help people acquire knowledge more conveniently, as they no longer need to look up multiple KBs to obtain the full information of an entity. However, knowledge base alignment is not a trivial task, and the alignment system is often complex [8,15]. Many traditional KB matching pipeline systems including [7,11,20,22] are based on content similarity calculation and propagation. There are some standard benchmark datasets from the Ontology Alignment Evaluation Initiative (OAEI), on which several alignment systems perform alignment algorithms. The datasets don’t contain many relationships and two KBs to be aligned have common relation and property strings, which can be used c Springer Nature Singapore Pte Ltd. 2016  H. Chen et al. (Eds.): CCKS 2016, CCIS 650, pp. 3–14, 2016. DOI: 10.1007/978-981-10-3168-7 1

4

Y. Hao et al.

to compute content similarity to assist instances alignment. The statistics of the author-disambiguation dataset from OAEI2015 Instance Matching are as Table 1. Think about a real case, we have an entity named m.02mjmr refering to president Barack Hussein Obama, How do we align it with the entity named Barack Obama in another KB with all of the relations and properties in two different encoding system? When facing the cross-linguistic or different encoding situation, what we can leverage are only the structure information of two KBs. Content information is important to KB alignment, but we think the structure information of KBs is also significant. Based on the observation above, we create two datasets including a subset of Freebase comprising 15 thousand selected entities (FB15K) and a dataset we constru