Los puntos clave no están disponibles para este artículo en este momento.
Abstract The explosion of content in distributed infer-marion retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR sys-tem performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collec-tion(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but repli-cas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query sim-ilarity, similarity improves query locality up to 15 % over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance.
Building similarity graph...
Analyzing shared references across papers
Loading...
University of Massachusetts Amherst
Add This Paper to Your Research Feed
Any time a new paper drops it will be there.
Lu et al. (Sat,) studied this question.