Combining Word Semantics within Complex Hilbert Space for Information Retrieval
Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored comp
- PDF / 252,341 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 63 Downloads / 234 Views
2
University of Bor˚ as, Bor˚ as, Sweden [email protected] Australian e-Health Research Centre, CSIRO, Brisbane, Australia 3 Queensland University of Technology, Brisbane, Australia
Abstract. Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
1
Introduction
In quantum theory, states are represented by vectors defined on a complex-valued Hilbert space. Complex numbers are a fundamental aspect in the mathematical formalism of quantum physics. For example, mathematically, the quantum interference term in the law of total probability for disjoint events arises because the probability amplitudes of events are modelled by complex numbers. Quantumlike formalisms were proposed to model systems outside of physics, for example in cognitive science, decision making, economy, etc. In information retrieval (IR), the pioneering work by van Rijsbergen [1] showed that the quantum formalism encompasses many state-of-the-art retrieval models; subsequent works proposed many quantum-like models for IR [2]. Common to all these proposals is the assumption that information objects (queries, documents, etc.) are represented in real-valued Hilbert spaces, even when the key modelling aspect is the quantum interference phenomenon [3]. Zuccon and Piwowarski argued that this H. Atmanspacher et al. (Eds.): QI 2013, LNCS 8369, pp. 160–171, 2014. c Springer-Verlag Berlin Heidelberg 2014 DOI: 10.1007/978-3-642-54943-4 14,
Combining Word Semantics within Complex Hilbert Space for IR
161
assumption is not imposed by the models themselves, which, being grounded on the mathematics of quantum theory, allow for complex valued representations. Instead it is rooted in the difficulties of understanding how complex numbers could be obtained from term counts in documents [4]. We derive a complex-valued representation of information by encoding semantics by complex numbers. The proposal helps to increase t
Data Loading...