Enhancing Relevance Models with Adaptive Passage Retrieval

Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous co

  • PDF / 248,464 Bytes
  • 9 Pages / 430 x 660 pts Page_size
  • 73 Downloads / 250 Views

DOWNLOAD

REPORT


Department of Computer Science, Mount Holyoke College, South Hadley, MA 01075, USA [email protected] 2 Department of Computer Science, CUNY City College, New York, NY 10031, USA [email protected]

Abstract. Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous collections. Previous research has shown that combining passage-level evidence with pseudo relevance feedback brings added benefits. In this paper, we study passage retrieval with relevance models in the language-modeling framework for document retrieval. An adaptive passage retrieval approach is proposed to document ranking based on the best passage of a document given a query. The proposed passage ranking method is applied to two relevance-based language models: the Lavrenko-Croft relevance model and our robust relevance model. Experiments are carried out with three query sets on three different collections from TREC. Our experimental results show that combining adaptive passage retrieval with relevance models (particularly the robust relevance model) consistently outperforms solely applying relevance models on full-length document retrieval. Keywords: Relevance models, passage retrieval, language modeling.

1 Introduction Language modeling approach is a successful alternative to traditional retrieval models for text retrieval. The language modeling framework was first introduced by Ponte and Croft [19], followed by many research activities related to this framework since then [1, 3, 4, 8, 10-12, 14-18, 20, 21, 23]. For example, query expansion techniques [3,11,12,17,18,21,23], pseudo-relevance feedback [4,11,12,17,18,21,23], parameter estimation methods [10], multi-word features [20], passage segmentations [16] and time constraints [14] have been proposed to improve the language modeling frameworks. Among them, query expansion with pseudo feedback can increase retrieval performance significantly [11,18,23]. It assumes a few top ranked documents retrieved with the original query to be relevant and uses them to generate a richer query model. However, two major problems remain unsolved in the query expansion techniques. First, the performance of a significant number of queries decreases when query C. Macdonald et al. (Eds.): ECIR 2008, LNCS 4956, pp. 463–471, 2008. © Springer-Verlag Berlin Heidelberg 2008

464

X. Li and Z. Zhu

expansion techniques are applied on some collections. Second, existing query expansion techniques are very sensitive to the number of documents used for pseudo feedback. Most approaches usually achieved the best performance when about 30 documents are used for pseudo feedback. As the number of feedback documents increases beyond 30, retrieval performance drops quickly. In our recent work [15], a robust relevance model is proposed based on a study of features that affected retrieval performance. These features included key words from orig