Chapter 11
* This chapter introduces the probability information retrieval models. There are several models are used with the probability ranking principle, like Binary Independency Model.
*The probability IR model
estimates the probability of a term t appearing in a relevant document P(t|R = 1), and a term appearing in a nonrelevant document. Based on these basic probabilities, in BIM, we can compute the odds of the term appearing if the document is relevant or nonrelevant.
Chapter 12
Language models
*The language model is a function that puts a probability measure over strings drawn from some vocabulary.
*The basic and most commonly used language modeling approach to IR is the Query Likelihood Model.
*In QLM, our goal is to rank documents by P(d|q), where the probability of a document is interpreted as the likelihood that it is relevant to the query. In other words, documents are ranked by the probability that a query would be observed as a random sample from the respective document model.
*Compared to other probabilistic approaches, such as BIM, the main difference initially appears to be that the LM approach does away with explicitly modeling relevance.
Reading Article
Main purpose of the article: introduce a new method called language modeling for relevance weighting and a new method to rank documents giving Boolean queries.
The language modeling method combines the advantages of probabilistic model and vector space model. It also gives elegant justification for using tf.idf weights. The weights are used to calculate the relevance of query strings. Meanwhile, from the experiments designed to analyze the efficiency of different methods, Two kinds of tools are used – the use of stop words and the use of stemmer. Due to these tools, it seems the new approach is better than other traditional statistical methods. However, we should admit that the traditional ways are still valuable because of their nice theoretical properties.
No comments:
Post a Comment