Key points are not available for this paper at this time.
Abstract The explosion of content in distributed infer-marion retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR sys-tem performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collec-tion(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but repli-cas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query sim-ilarity, similarity improves query locality up to 15 % over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhihong Lu
Shenyang Ligong University
Kathryn S. McKinley
Google (United States)
University of Massachusetts Amherst
Building similarity graph...
Analyzing shared references across papers
Loading...
Lu et al. (Sat,) studied this question.
synapsesocial.com/papers/6a10efd9ba20d9a181ee8083 — DOI: https://doi.org/10.1145/345508.345591