Temporal knowledge extraction from large-scale text corpus

  • PDF / 1,077,851 Bytes
  • 22 Pages / 439.642 x 666.49 pts Page_size
  • 60 Downloads / 157 Views

DOWNLOAD

REPORT


Temporal knowledge extraction from large-scale text corpus Yu Liu1

· Wen Hua1 · Xiaofang Zhou1

Received: 28 August 2019 / Revised: 30 April 2020 / Accepted: 10 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Knowledge, in practice, is time-variant and many relations are only valid for a certain period of time. This phenomenon highlights the importance of harvesting temporal-aware knowledge, i.e., the relational facts coupled with their valid temporal interval. Inspired by pattern-based information extraction systems, we resort to temporal patterns to extract timeaware knowledge from free text. However, pattern design is extremely laborious and time consuming even for a single relation, and free text is usually ambiguous which makes temporal instance extraction extremely difficult. Therefore, in this work, we study the problem of temporal knowledge extraction with two steps: (1) temporal pattern extraction by automatically analysing a large-scale text corpus with a small number of seed temporal facts, (2) temporal instance extraction by applying the identified temporal patterns. For pattern extraction, we introduce various techniques, including corpus annotation, pattern generation, scoring and clustering, to improve both accuracy and coverage of the extracted patterns. For instance extraction, we propose a double-check strategy to improve the accuracy and a set of node-extension rules to improve the coverage. We conduct extensive experiments on real world datasets and compared with state-of-the-art systems. Experimental results verify the effectiveness of our proposed methods for temporal knowledge harvesting. Keywords Temporal knowledge harvesting · Temporal patterns · Temporal facts · Knowledge base

 Wen Hua

[email protected] Yu Liu [email protected] Xiaofang Zhou [email protected] 1

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Queensland, Australia

World Wide Web

1 Introduction Recently, large-scale Knowledge Bases (KBs) have been used in many algorithms, applications and tools, such as entity linking [32], relation extraction [23], question answering [9] and other advanced tasks [14, 20, 43]. Large-scale KBs, such as DBpedia [2], NELL [25], Probase [41], and YAGO [21], contain millions of entities and relational facts. However, most of them regard relational facts as time-invariant and ignore the corresponding valid time period. Actually, many relations in real world are changing and involving over time, i.e., they are only valid for a certain temporal period. For example, the relation instance SpouseOf (“Brad Pitt”, “Angelina Jolie”) is valid only over the temporal period of 2014 to 2019 (according to Wikipedia pages). Obviously, this additional temporal dimension is particularly important and beneficial in many application scenarios including QA systems, text summarisation, timeline generation, etc [4]. Research on complementing KBs with a temporal dimension is very current. To the best of our knowledge