1.3 Processing Boolean queries
Simple conjunctive query:
- Locate word in the Dictionary
- Retrieve its postings
- Locate another word in the Dictionary
- Retrieve its postings
Intersect postings lists
Query optimization is the process of selecting how to organize the work of answering a query so that the least total amount of work needs to be done by the system.
1.4 The extended Boolean model versus ranked retrieval
Boolean retrieval model mostly before 1990s. Systems used not just the basic Boolean operations (AND, OR, and NOT), but also extended Boolean retrieval models that incorporating additional operators such as term proximity operators.
6 Scoring, term weighting and the vector space model
Free text queries cannot be appropriately matched by boolean retrieval, because many documents will be retrieved that match all the words via intersection. Other approaches arise:
- Parametric and zone indexes
- Weighted zone scoring
- Learning weights
scoring and ranking in information retrieval, known as machine-learned relevance.
- The optimal weight g
- Inverse document frequency
- tf*idf weighting (term frequency * inverse document frequency)
Vector space models.
cosine similarity
sim(d1,d2)=V(d1).V(d2) / |V(d1)|*|V(d2)|
Models refinements and they can be mixed and matched for both documents and queries to achieve the best results.
No comments:
Post a Comment