Bridging Information Retrieval and Databases
For bridging the gap between information retrieval (IR) and databases (DB), this article focuses on the logical view. We claim that IR should adopt three major concepts from DB, namely inference, vague predicates and expressive query languages. By regardi
- PDF / 354,137 Bytes
- 19 Pages / 439.363 x 666.131 pts Page_size
- 43 Downloads / 181 Views
Abstract. For bridging the gap between information retrieval (IR) and databases (DB), this article focuses on the logical view. We claim that IR should adopt three major concepts from DB, namely inference, vague predicates and expressive query languages. By regarding IR as uncertain inference, probabilistic versions of relational algebra and Datalog yield very powerful inference mechanisms for IR as well as allowing for more flexible systems. For dealing with various media and data types, vague predicates form a natural extension of text retrieval methods to attribute values, thus switching from propositional to predicate logic. A more expressive IR query language should support joins, be able to compute aggregated results, and allow for restructuring of the result objects.
1
Introduction
For several decades, information retrieval (IR) and databases (DB) have evolved as separate subfields of computer science (see e.g. the juxtaposition in [18, ch. 1]). However, in recent years, there have been increasing research activities to bridge the gap between these two areas and develop approaches integrating IR and DB features. There are various levels where such an integration can take place, namely at the physical, the logical or the conceptual level of information systems. In this article, we will focus on the logical level, mainly due to the fact that there is a nice theoretical framework that supports the integration of IR and DB at this level. In the logical view on DB, the (retrieval) task of the system can be described as follows: given a query q, find objects o which imply the query, i. e. o → q. On the other hand,, Rijsbergen defines IR as being based on uncertain inference where for a given query q, the IR system should compute the probability P (d → q) for each document d. By comparing the two definitions, we can see that IR can be regarded as a generalization of the DB approach here, since it replaces deterministic by uncertain inference. Based on this interpretation, this article discusses how three major DB concepts can be adopted and extended in order to enhance current IR systems. In the next section, we will focus on inference, showing how probabilistic versions of relational algebra and Datalog increase the inferential capabilities of IR systems. Section 3 introduces vague predicates as a method for extending classical IR methods for dealing with attribute values and multimedia data. Query language expressiveness is discussed in Section 4, pointing out potential benefits N. Ferro (Ed.): PROMISE Winter School 2013, LNCS 8173, pp. 97–115, 2014. c Springer-Verlag Berlin Heidelberg 2014
98
N. Fuhr
from more expressive IR query languages. Two further concepts are briefly addressed in Section 5, namely four-valued logic and the architecture of future IR systems. Section 6 concludes this contribution.
2
Inference
Following Rijsbergen’s interpretation of IR as uncertain inference, this section will demonstrate the close connection between IR and the logical view on databases. For that, we start from relation
Data Loading...