Knowledge graph construction from multiple online encyclopedias

  • PDF / 2,434,443 Bytes
  • 28 Pages / 439.642 x 666.49 pts Page_size
  • 13 Downloads / 232 Views

DOWNLOAD

REPORT


Knowledge graph construction from multiple online encyclopedias Tianxing Wu1,2 · Haofen Wang3 · Cheng Li1 · Guilin Qi1 · Xing Niu4 · Meng Wang1 · Lin Li1 · Chaomin Shi1 Received: 11 February 2019 / Revised: 23 July 2019 / Accepted: 6 August 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract In recent years, lots of knowledge graphs built from Wikipedia, the largest multilingual online encyclopedia, have been published on the Web to support various applications. However, since non-English data in Wikipedia are sparse, some projects work on knowledge graph construction from multiple non-English online encyclopedias, but many technical details are missing, so it is hard to reuse their frameworks or techniques. In this paper, we propose a new framework to solve knowledge graph construction from multiple online encyclopedias. The core modules are knowledge extraction and knowledge linking. Knowledge extraction consists of regular extraction, i.e., extracting targeted article contents in the whole online encyclopedias periodically, and live extraction, which only extracts the article contents of new and updated entities. Knowledge linking utilizes heuristic lightweight entity matching strategies and a semi-supervised learning method to find duplicated entities and properties from different online encyclopedias. Experimental results show that our approaches for knowledge extraction and linking outperform state-of-the-art baselines in different evaluation metrics, and our framework can generate a large-scale knowledge graph after inputting multiple online encyclopedias. Keywords Knowledge graph · Knowledge extraction · Knowledge linking · Semantic Web

This article belongs to the Topical Collection: Special Issue on Application-Driven Knowledge Acquisition Guest Editors: Xue Li, Sen Wang, and Bohan Li  Guilin Qi

[email protected] Haofen Wang [email protected] 1

Southeast University, Nanjing, China

2

Nanyang Technological University, Singapore, Singapore

3

Intelligent Big Data Visualization Lab, Tongji University, Shanghai, China

4

University of Maryland, College Park, MD, USA

World Wide Web

1 Introduction With the development of Semantic Web, a growing amount of open structured (RDF) data has been published on the Web. Linked Data [2] initiates the effort to connect the distributed data across the Web and there have been over 1,200 datasets within Linking Open Data (LOD) community project.1 The core datasets in LOD are the knowledge graphs built based on the multilingual online encyclopedia: Wikipedia, such as DBpedia [15], YAGO [17] and BabelNet [19]. These multi-domain encyclopedic knowledge graphs are important foundations of various intelligent applications, e.g., semantic search, question answering and domain-specific knowledge graph construction. However, non-English data in Wikipedia are sparse, which limits the development of non-English knowledge graphs. Actually, there exist many non-English online encyclopedias, such as Baidu Baike2 (Chinese), Hudong Baike3 (C