Multi-granularity semantic representation model for relation extraction

  • PDF / 824,924 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 92 Downloads / 256 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

Multi-granularity semantic representation model for relation extraction Ming Lei1



Heyan Huang1 • Chong Feng1

Received: 13 March 2020 / Accepted: 26 October 2020  Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract In natural language, a group of words constitute a phrase and several phrases constitute a sentence. However, existing transformer-based models for sentence-level tasks abstract sentence-level semantics from word-level semantics directly, which override phrase-level semantics so that they may be not favorable for capturing more precise semantics. In order to resolve this problem, we propose a novel multi-granularity semantic representation (MGSR) model for relation extraction. This model can bridge the semantic gap between low-level semantic abstraction and high-level semantic abstraction by learning word-level, phrase-level, and sentence-level multi-granularity semantic representations successively. We segment a sentence into entity chunks and context chunks according to an entity pair. Thus, the sentence is represented as a nonempty segmentation set. The entity chunks are noun phrases, and the context chunks contain the key phrases expressing semantic relations. Then, the MGSR model utilizes inter-word, inner-chunk and inter-chunk three kinds of different selfattention mechanisms, respectively, to learn the multi-granularity semantic representations. The experiments on two standard datasets demonstrate our model outperforms the previous models. Keywords Relation extraction  Information extraction  Natural language processing  Deep learning

1 Introduction Relation extraction (RE) aims to find out the semantic relations between pairs of entities from sentences. It can be useful in question answering, machine translation and other natural language processing (NLP) tasks. Generally, a triplet is used as the format of the structured representation. As shown in Fig. 1, there are three entities and two relation triplets in the example sentence. The expression of the relation type MC (Member-Collection) is mainly dependent upon the words ‘‘a member of’’ which are in the middle of the target entity pair, while the expression of ED (Entity-Destination) is dependent upon the words ‘‘arrived in’’ which are in the middle of the entity pair. We call these words the key phrases expressing semantic relations. Although ‘‘arrived in’’ exists in the middle of ‘‘UN troops’’ & Ming Lei [email protected] Heyan Huang [email protected] 1

and ‘‘Africa,’’ why there is no ED relation between them? Maybe the word ‘‘of,’’ which is on the left of the target entity pair, is the key phrase for relation classification. This shows that the key phrases and the relative position information could be important in the relation extraction task. Many models leverage the dependency structures in sentences to capture the key phrases, and they have achieved great success in the relation extraction task. For instance, these researches [18, 32] utilized t