Bridging Information Retrieval and Databases

For bridging the gap between information retrieval (IR) and databases (DB), this article focuses on the logical view. We claim that IR should adopt three major concepts from DB, namely inference, vague predicates and expressive query languages. By regardi

PDF / 354,137 Bytes
19 Pages / 439.363 x 666.131 pts Page_size
43 Downloads / 187 Views

DOWNLOAD

REPORT

Abstract. For bridging the gap between information retrieval (IR) and databases (DB), this article focuses on the logical view. We claim that IR should adopt three major concepts from DB, namely inference, vague predicates and expressive query languages. By regarding IR as uncertain inference, probabilistic versions of relational algebra and Datalog yield very powerful inference mechanisms for IR as well as allowing for more ﬂexible systems. For dealing with various media and data types, vague predicates form a natural extension of text retrieval methods to attribute values, thus switching from propositional to predicate logic. A more expressive IR query language should support joins, be able to compute aggregated results, and allow for restructuring of the result objects.

1

Introduction

For several decades, information retrieval (IR) and databases (DB) have evolved as separate subﬁelds of computer science (see e.g. the juxtaposition in [18, ch. 1]). However, in recent years, there have been increasing research activities to bridge the gap between these two areas and develop approaches integrating IR and DB features. There are various levels where such an integration can take place, namely at the physical, the logical or the conceptual level of information systems. In this article, we will focus on the logical level, mainly due to the fact that there is a nice theoretical framework that supports the integration of IR and DB at this level. In the logical view on DB, the (retrieval) task of the system can be described as follows: given a query q, ﬁnd objects o which imply the query, i. e. o → q. On the other hand,, Rijsbergen deﬁnes IR as being based on uncertain inference where for a given query q, the IR system should compute the probability P (d → q) for each document d. By comparing the two deﬁnitions, we can see that IR can be regarded as a generalization of the DB approach here, since it replaces deterministic by uncertain inference. Based on this interpretation, this article discusses how three major DB concepts can be adopted and extended in order to enhance current IR systems. In the next section, we will focus on inference, showing how probabilistic versions of relational algebra and Datalog increase the inferential capabilities of IR systems. Section 3 introduces vague predicates as a method for extending classical IR methods for dealing with attribute values and multimedia data. Query language expressiveness is discussed in Section 4, pointing out potential beneﬁts N. Ferro (Ed.): PROMISE Winter School 2013, LNCS 8173, pp. 97–115, 2014. c Springer-Verlag Berlin Heidelberg 2014

98

N. Fuhr

from more expressive IR query languages. Two further concepts are brieﬂy addressed in Section 5, namely four-valued logic and the architecture of future IR systems. Section 6 concludes this contribution.

2

Inference

Following Rijsbergen’s interpretation of IR as uncertain inference, this section will demonstrate the close connection between IR and the logical view on databases. For that, we start from relation

Data Loading...

Bridging Information Retrieval and Databases

Recommend Documents

Bridging Between Information Retrieval and Databases PROMISE Winter

Multidisciplinary Information Retrieval 6th Information Retrieval Fa

Multidisciplinary Information Retrieval 5th Information Retrieval Fa

Information Retrieval Technology Asia Information Retrieval Symposiu

Multidisciplinary Information Retrieval 7th Information Retrieval Fa

Multidisciplinary Information Retrieval Second Information Retrieval

Retrieval Models for Text Databases

Image Retrieval and Classification in Relational Databases

Time and Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval Technology 10th Asia Information Retrieval Soc