Locking Granularity and Lock Types
- PDF / 1,432,002 Bytes
- 72 Pages / 547.087 x 737.008 pts Page_size
- 82 Downloads / 220 Views
L2 Cache ▶ Processor Cache
L3 Cache ▶ Processor Cache
Language Models D JOERD H IEMSTRA University of Twente, Enschede, The Netherlands
Synonyms Generative models
Definition A language model assigns a probability to a piece of unseen text, based on some training data. For example, a language model based on a big English newspaper archive is expected to assign a higher probability to ‘‘a bit of text’’ than to ‘‘aw pit tov tags,’’ because the words in the former phrase (or word pairs or word triples if so-called N-Gram Models are used) occur more frequently in the data than the words in the latter phrase. For information retrieval, typical usage is to build a language model for each document. At search time, the top ranked document is the one whose language model assigns the highest probability to the query.
Historical Background The term language models originates from probabilistic models of language generation developed for automatic speech recognition systems in the early 1980s [9]. #
2009 Springer ScienceþBusiness Media, LLC
Speech recognition systems use a language model to complement the results of the acoustic model which models the relation between words (or parts of words called phonemes) and the acoustic signal. The history of language models, however, goes back to the beginning of the twentieth century when Andrei Markov used language models (Markov models) to model letter sequences in works of Russian literature [3]. Another famous application of language models are Claude Shannon’s models of letter sequences and word sequences, which he used to illustrate the implications of coding and information theory [17]. In the 1990s, language models were applied as a general tool for several natural language processing applications, such as part-of-speech tagging, machine translation, and optical character recognition. Language models were applied to information retrieval by a number of research groups in the late 1990s [4,7,14,15]. They became rapidly popular in information retrieval research. By 2001, the ACM SIGIR conference had two separate sessions on language models containing five papers in total [12]. In 2003, a group of leading information retrieval researchers published a research roadmap ‘‘challenges in information retrieval and language modeling’’ [1], indicating that the future of information retrieval and the future of language modeling can not be seen separate from each other.
Foundations Language models are generative models, i.e., models that define a probability mechanism for generating language. Such generative models might be explained by the following probability mechanism: Imagine picking a term T at random from this page by pointing at the page with closed eyes. This mechanism defines a probability P(TjD), which could be defined as the relative frequency of the occurrence of the event, i.e., by the number of occurrences of a term on the page divided by the total number of terms on the page. Suppose the process is repeated n times, picking one at a time the terms T1, T2,...,Tn. Then, a
Data Loading...