Key points are not available for this paper at this time.
This paper examines whether the Cranfield evaluation methodology is robust to gross violations of the completeness assumption (i.e., the assumption that all relevant documents within a test collection have been identified and are present in the collection). We show that current evaluation measures are not robust to substantially incomplete relevance judgments. A new measure is introduced that is both highly correlated with existing measures when complete judgments are available and more robust to incomplete judgment sets. This finding suggests that substantially larger or dynamic test collections built using current pooling practices should be viable laboratory tools, despite the fact that the relevance information will be incomplete and imperfect.
Building similarity graph...
Analyzing shared references across papers
Loading...
Buckley et al. (Sun,) studied this question.
synapsesocial.com/papers/6a0e9ec31c5e2d2319f99e98 — DOI: https://doi.org/10.1145/1008992.1009000
Chris Buckley
PAREXEL International (United Kingdom)
Ellen M. Voorhees
National Institute of Standards and Technology
National Institute of Standards and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: