KGAnet: a knowledge graph attention network for enhancing natural language inference

  • PDF / 895,682 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 67 Downloads / 148 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

KGAnet: a knowledge graph attention network for enhancing natural language inference Meina Song1 • Wen Zhao1 • E. HaiHong1 Received: 15 May 2019 / Accepted: 14 March 2020 Ó The Author(s) 2020

Abstract Natural language inference (NLI) is the basic task of many applications such as question answering and paraphrase recognition. Existing methods have solved the key issue of how the NLI model can benefit from external knowledge. Inspired by this, we attempt to further explore the following two problems: (1) how to make better use of external knowledge when the total amount of such knowledge is constant and (2) how to bring external knowledge to the NLI model more conveniently in the application scenario. In this paper, we propose a novel joint training framework that consists of a modified graph attention network, called the knowledge graph attention network, and an NLI model. We demonstrate that the proposed method outperforms the existing method which introduces external knowledge, and we improve the performance of multiple NLI models without additional external knowledge. Keywords Natural language processing  Natural language inference  External knowledge

1 Introduction Natural language inference (NLI), also known as recognizing textual entailment, is a challenging and fundamental task in natural language understanding. Its aim is to determine the relationship (entailment, neutral, or contradiction) between a premise and hypothesis. In the past few years, large annotation datasets, such as the Stanford NLI (SNLI) dataset [1]1 and the Multi-Genre NLI (MultiNLI) corpus [2],2 have been provided, which has made it possible to train quite complex neural networkbased models that are suitable for a large number of parameters to better solve NLI problems. These models are divided into two main categories, sentence-encoding and inter-sentence models. Sentence-encoding models use the Siamese structure [3] as a reference, for encoding premises and hypotheses into sentence vectors and then for comparing the distance of the & E. HaiHong [email protected] Wen Zhao [email protected] 1

Beijing University of Posts and Telecommunications, Beijing, China

sentence vectors to obtain the relationship categories. Talman et al. [4] used hierarchical biLSTM and a max pooling architecture to encode sentences into vectors. Nie et al. [5] used shortcut-stacked sentence encoders to perform multi-domain semantic matching. Shen et al. [6] applied a hybrid of hard and soft attention and reinforcement learning for modeling sequences. Im and Cho [7] proposed a distance-based self-attention network, which considers the word distance using a simple distance mask. Yoon et al. [8] designed dynamic self-attention by modifying the dynamic routing in a capsule network [9] for natural language processing. Encoders include convolutional neural networks (CNNs) [10], recurrent neural network variants [1, 5, 11], and self-attention networks [12]. In contrast to the above methods, in