Thursday, January 30, 2014

Unit 4 Reading Notes

Unit 4 1.3, 1.4, 6 IIR
·          Section 1.3 introduces the algorithms for the intersection of the posting lists. Moreover, it explains some more algorithms which improve the efficiency of intersection, like merging with the shortest list at first. It examined how to do efficient retrieval via linear time merges and simple query optimization.

·          Section 1.4 compares the extended boolean search and ranked retrieval.
·          There are several examples of boolean search queries including the use of proximity operators, for example, westlaw.com.
·          Boolean queries are precise: a document either matches the query or it does not. This offers the user greater control and transparency over what is retrieved.
·          However, boolean queries also has some problems. A general problem with Boolean search is that using AND operators tends to produce high precision but low recall searches, while using OR operators gives low precision but high recall searches.

·          Chapter 6 mainly talks about the mechanisms used to rank-order the documents matching a query.
·          Weighted zone scoring is sometimes referred to also as ranked Boolean retrieval. The weighted zone score is defined to be
. (A set of documents each of which has l. gi is the weighted score for each zone. si is the Boolean score denoting a match (or absence thereof) between q and the ith zone. For instance, the Boolean score from a zone could be 1 if all the query term(s) occur in that zone, and zero otherwise; indeed, it could be any Boo- lean function that maps the presence of query terms in a zone to 0, 1.)
·          The weighted score g can be learned from a set of training examples. (not understand the process)
·          Tf-idf is used to weight the score with taking the frequency of the term into consideration.

·          Section 6.3, 6.4 explains lots of functions and definition of the use of vector spaces to compute the similarities between different documents and queries.

No comments:

Post a Comment