July 21, 2025Open Access

LLM-as-a-Judge: automated evaluation of search query parsing using large language models

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Introduction: The adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches. Methods: We propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an online classified platform. Our approach leverages LLMs' contextual reasoning capabilities through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessments. Additionally, we introduce a Contextual Evaluation Prompt Routing strategy to improve reliability and reduce hallucinations. Results: Experiments conducted on both small- and large-scale datasets demonstrate that LLM-based evaluation achieves approximately 90% agreement with human judgments. Discussion: These results validate LLM-driven evaluation as a scalable, interpretable, and effective alternative to traditional evaluation methods, providing robust query parsing for real-world search systems.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Baysan et al. (Mon,) studied this question.

synapsesocial.com/papers/6a19252ac05413006f57ed83 https://doi.org/https://doi.org/10.3389/fdata.2025.1611389

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo