An Introduction to Information Retrieval
Information retrieval is a discipline that deals with the representation, storage, organization, and access to information items. The goal of information retrieval is to obtain information that might be useful or relevant to the user: library card cabinet
- PDF / 188,218 Bytes
- 9 Pages / 441 x 666 pts Page_size
- 35 Downloads / 294 Views
An Introduction to Information Retrieval
Abstract Information retrieval is a discipline that deals with the representation, storage, organization, and access to information items. The goal of information retrieval is to obtain information that might be useful or relevant to the user: library card cabinets are a “traditional” information retrieval system, and, in some sense, even searching for a visiting card in your pocket to find out a colleague’s contact details might be considered as an information retrieval task. In this chapter we introduce information retrieval as a scientific discipline, providing a formal characterization centered on the notion of relevance. We touch on some of its challenges and classic applications and then dedicate a section to its main evaluation criteria: precision and recall.
1.1 What Is Information Retrieval? Information retrieval (often abbreviated as IR) is an ancient discipline. For approximately 4,000 years, mankind has organized information for later retrieval and usage: ancient Romans and Greeks recorded information on papyrus scrolls, some of which had tags attached containing a short summary in order to save time when searching for them. Tables of contents first appeared in Greek scrolls during the second century B.C. The earliest representative of computerized document repositories for search was the Cornell SMART System, developed in the 1960s (see [68] for a first implementation). Early IR systems were mainly used by expert librarians as reference retrieval systems in batch modalities; indeed, many libraries still use categorization hierarchies to classify their volumes. However, modern computers and the birth of the World Wide Web (1989) marked a permanent change to the concepts of storage, access, and searching of document collections, making them available to the general public and indexing them for precise and large-coverage retrieval. As an academic discipline, IR has been defined in various ways [26]. Sections 1.1.1 and 1.1.2 discuss two definitions highlighting different interesting aspects that characterize IR: relevance and large, unstructured data sources. S. Ceri et al., Web Information Retrieval, Data-Centric Systems and Applications, DOI 10.1007/978-3-642-39314-3_1, © Springer-Verlag Berlin Heidelberg 2013
3
4
1 An Introduction to Information Retrieval
1.1.1 Defining Relevance In [149], IR is defined as the discipline finding relevant documents as opposed to simple matches to lexical patterns in a query. This underlines a fundamental aspect of IR, i.e., that the relevance of results is assessed relative to the information need, not the query. Let us exemplify this by considering the information need of figuring out whether eating chocolate is beneficial in reducing blood pressure. We might express this via the search engine query: “chocolate effect pressure”; however, we will evaluate a resulting document as relevant if it addresses the information need, not just because it contains all the words in the query—although this would be considered to be a go
Data Loading...