Unit 4 1.3, 1.4, 6 IIR
·
Section
1.3 introduces the algorithms for the intersection of the posting lists.
Moreover, it explains some more algorithms which improve the efficiency of
intersection, like merging with the shortest list at first. It examined how to
do efficient retrieval via linear time merges and simple query optimization.
·
Section
1.4 compares the extended boolean search and ranked retrieval.
·
There
are several examples of boolean search queries including the use of proximity
operators, for example, westlaw.com.
·
Boolean
queries are precise: a document either matches the query or it does not. This
offers the user greater control and transparency over what is retrieved.
·
However,
boolean queries also has some problems. A general problem with Boolean search
is that using AND operators tends to produce high precision but low recall
searches, while using OR operators gives low precision but high recall searches.
·
Chapter
6 mainly talks about the mechanisms used to rank-order the documents matching a
query.
·
Weighted
zone scoring is sometimes referred to also as ranked Boolean retrieval. The
weighted zone score is defined to be
. (A set of documents each of which has l.
gi is the weighted score for each zone. si is the Boolean
score denoting a match (or absence thereof) between q and the ith zone. For
instance, the Boolean score from a zone could be 1 if all the query term(s)
occur in that zone, and zero otherwise; indeed, it could be any Boo- lean
function that maps the presence of query terms in a zone to 0, 1.)
·
The
weighted score g can be learned from a set of training examples. (not
understand the process)
·
Tf-idf
is used to weight the score with taking the frequency of the term into
consideration.
·
Section
6.3, 6.4 explains lots of functions and definition of the use of vector spaces to
compute the similarities between different documents and queries.

No comments:
Post a Comment