A Probabilistic Multimedia Retrieval Model and Its Evaluation

  • PDF / 1,944,175 Bytes
  • 13 Pages / 595.2 x 792 pts Page_size
  • 79 Downloads / 173 Views

DOWNLOAD

REPORT


A Probabilistic Multimedia Retrieval Model and Its Evaluation Thijs Westerveld National Research Institute for Mathematics and Computer Science (CWI), P.O. Box 94079, 1090 GB Amsterdam, The Netherlands Email: [email protected]

Arjen P. de Vries National Research Institute for Mathematics and Computer Science (CWI), P.O. Box 94079, 1090 GB Amsterdam, The Netherlands Email: [email protected]

Alex van Ballegooij National Research Institute for Mathematics and Computer Science (CWI), P.O. Box 94079, 1090 GB Amsterdam, The Netherlands Email: [email protected]

Franciska de Jong University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands Email: [email protected]

Djoerd Hiemstra University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands Email: [email protected] Received 21 March 2002 and in revised form 1 November 2002 We present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text-based search with models for visual search. The textual model is based on the language modelling approach to text retrieval, and the visual information is modelled as a mixture of Gaussian densities. Both models have proved successful on various standard retrieval tasks. We evaluate the multimodal model on the search task of TREC’s video track. We found that the disclosure of video material based on visual information only is still too difficult. Even with purely visual information needs, text-based retrieval still outperforms visual approaches. The probabilistic model is useful for text, visual, and multimedia retrieval. Unfortunately, simplifying assumptions that reduce its computational complexity degrade retrieval effectiveness. Regarding the question whether the model can effectively combine information from different modalities, we conclude that whenever both modalities yield reasonable scores, a combined run outperforms the individual runs. Keywords and phrases: multimedia retrieval, evaluation, probabilistic models, Gaussian mixture models, language models.

1.

INTRODUCTION

Both image analysis and video motion processing have been unable to meet the requirements for disclosing the content of large scale unstructured video archives. There appear to be two major unsolved problems in the indexing and retrieval of video material on the basis of these technologies, namely, (a) image and video processing is still far away from understanding the content of a picture in the sense of a knowledge-based understanding and (b) there is no effective query language (in the wider sense) for searching image

and video databases. Unlike the target content in the field of text retrieval, the content of video archives is hard to capture at the conceptual level. An increasing number of developers that accept this analysis of the state-of-the-art in the field have started to use human language as the media interlingua, making the assumption that as long as there is no possibility to carry out both a broad scale recognition of visual objects and an automatic ma