Key points are not available for this paper at this time.
Large language models (LLMs) are capable of assessing document and query characteristics, including relevance, and are now being used for a variety of different classification labeling tasks as well. This study explores how to use LLMs to classify an information need, often represented as a user query. In particular, our goal is to classify the cognitive complexity of the search task for a given "backstory". Using 180 TREC topics and backstories, we show that GPT-based LLMs agree with human experts as much as other human experts. We also show that batching and ordering can significantly impact the accuracy of GPT-3.5, but rarely alter the quality of GPT-4 predictions. This study provides insights into the efficacy of large language models for annotation tasks normally completed by humans, and offers recommendations for other similar applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zendel et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68e74e1db6db6435876c6fc2 — DOI: https://doi.org/10.1145/3627508.3638322
Oleg Zendel
J. Shane Culpepper
Falk Scholer
The University of Queensland
RMIT University
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: