Dual embeddings and metrics for word and relational similarity

  • PDF / 814,519 Bytes
  • 15 Pages / 439.642 x 666.49 pts Page_size
  • 7 Downloads / 205 Views

DOWNLOAD

REPORT


Dual embeddings and metrics for word and relational similarity Dandan Li1

· Douglas Summers-Stay1

© This is a U.S. Government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2019

Abstract Word embedding models excel in measuring word similarity and completing analogies. Word embeddings based on different notions of context trade off strengths in one area for weaknesses in another. Linear bag-of-words contexts, such as in word2vec, can capture topical similarity better, while dependency-based word embeddings better encode functional similarity. By combining these two word embeddings using different metrics, we show how the best aspects of both approaches can be captured. We show state-of-the-art performance on standard word and relational similarity benchmarks. Keywords Word embeddings · Dual embeddings · Word similarity · Relational similarity Mathematics Subject Classification (2010) 68 · 68T30 · 68T50

1 Introduction Accurate measures of semantic and relational similarity can be applied to many natural language processing (NLP) tasks, including query expansion, word sense disambiguation, machine translation, information extraction and question answering. Tasks as diverse as musical parody [14] and text animation [41] require the ability to correctly characterize what word pairs are similar. Previous work addressing the problem includes: (1) learning word embeddings from large text corpora [8, 23, 29–31, 46]; (2) extracting knowledge from knowledge graphs, such as WordNet [2, 20, 50] and ConceptNet [5]; (3) combining the above two models [1, 21, 53]. Bengio et al. [3] and Collobert and Weston [7], pioneered word embedding models, where each word is represented by a dense vector and semantically similar words are mapped to nearby vectors. Mikolov et al. [29] introduce the skip-gram embedding model trained  Dandan Li

[email protected] Douglas Summers-Stay [email protected] 1

U.S. Army Research Laboratory, Adelphi, MD, USA

D. Li, D. Summers-Stay

using negative sampling [30], commonly known as word2vec, which can be efficiently create word embeddings from large bodies of text. Word representations learned from distributional models capture not only similarities between words but also analogical similarities between pairs of words [31]. Levy and Goldberg [23] generalize this model to include arbitrary word contexts and introduce the notion of dependency-based word embeddings, using syntactic contexts derived from dependency parse-trees. The word2vec model with linear bag-of-words contexts tends to capture broad topical similarity while dependency-based word embeddings with syntactic contexts are better at representing functional similarity [23]. Turney [46] proposed unifying semantic relations and compositions by the dual-space model. The dual-space model includes both a domain space based on the nouns that occur nearby and a function space characterized by the word’s syntactic relation to nearby verbs. This paper contributes to the following a