Knowledge graph construction from multiple online encyclopedias

PDF / 2,434,443 Bytes
28 Pages / 439.642 x 666.49 pts Page_size
13 Downloads / 367 Views

Knowledge graph construction from multiple online encyclopedias Tianxing Wu1,2 · Haofen Wang3 · Cheng Li1 · Guilin Qi1 · Xing Niu4 · Meng Wang1 · Lin Li1 · Chaomin Shi1 Received: 11 February 2019 / Revised: 23 July 2019 / Accepted: 6 August 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract In recent years, lots of knowledge graphs built from Wikipedia, the largest multilingual online encyclopedia, have been published on the Web to support various applications. However, since non-English data in Wikipedia are sparse, some projects work on knowledge graph construction from multiple non-English online encyclopedias, but many technical details are missing, so it is hard to reuse their frameworks or techniques. In this paper, we propose a new framework to solve knowledge graph construction from multiple online encyclopedias. The core modules are knowledge extraction and knowledge linking. Knowledge extraction consists of regular extraction, i.e., extracting targeted article contents in the whole online encyclopedias periodically, and live extraction, which only extracts the article contents of new and updated entities. Knowledge linking utilizes heuristic lightweight entity matching strategies and a semi-supervised learning method to find duplicated entities and properties from different online encyclopedias. Experimental results show that our approaches for knowledge extraction and linking outperform state-of-the-art baselines in different evaluation metrics, and our framework can generate a large-scale knowledge graph after inputting multiple online encyclopedias. Keywords Knowledge graph · Knowledge extraction · Knowledge linking · Semantic Web

This article belongs to the Topical Collection: Special Issue on Application-Driven Knowledge Acquisition Guest Editors: Xue Li, Sen Wang, and Bohan Li Guilin Qi

[email protected] Haofen Wang [email protected] 1

Southeast University, Nanjing, China

2

Nanyang Technological University, Singapore, Singapore

3

Intelligent Big Data Visualization Lab, Tongji University, Shanghai, China

4

University of Maryland, College Park, MD, USA

World Wide Web

1 Introduction With the development of Semantic Web, a growing amount of open structured (RDF) data has been published on the Web. Linked Data [2] initiates the effort to connect the distributed data across the Web and there have been over 1,200 datasets within Linking Open Data (LOD) community project.1 The core datasets in LOD are the knowledge graphs built based on the multilingual online encyclopedia: Wikipedia, such as DBpedia [15], YAGO [17] and BabelNet [19]. These multi-domain encyclopedic knowledge graphs are important foundations of various intelligent applications, e.g., semantic search, question answering and domain-specific knowledge graph construction. However, non-English data in Wikipedia are sparse, which limits the development of non-English knowledge graphs. Actually, there exist many non-English online encyclopedias, such as Baidu Baike2 (Chinese), Hudong Baike3 (C

Data Loading...

Knowledge graph construction from multiple online encyclopedias

Recommend Documents

Domain-Specific Knowledge Graph Construction

Effective Online Knowledge Graph Fusion

Open Information Extraction for Knowledge Graph Construction

Knowledge Graph Construction of Personal Relationships

Enhancing Online Knowledge Graph Population with Semantic Knowledge

Research on Tibetan Medicine Entity Recognition and Knowledge Graph Construction

Semantic Enhancement Based Dynamic Construction of Domain Knowledge Graph

Application of Open-Source Software in Knowledge Graph Construction

Temporal Knowledge Graph Incremental Construction Model for Recommendation

Knowledge Graph Construction for Payment Data Risk Control

Active Learning Based Relation Classification for Knowledge Graph Construction from Conversation Data

TransMVG: Knowledge Graph Embedding Based on Multiple-Valued Gates