What question did this study set out to answer?

The aim is to enhance NL2SQL task performance by balancing cost and accuracy through optimized configurations.

April 10, 2026Open Access

PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL

Puntos clave

The aim is to enhance NL2SQL task performance by balancing cost and accuracy through optimized configurations.
Developed PRISM framework to identify effective NL2SQL configurations for various database schemas.
Utilized cost-aware Bayesian Optimization for exploring configuration space during offline phase.
Implemented single-candidate and ensemble deployment options for maximizing accuracy.
Achieved 69.48% execution accuracy in single-candidate setting, a 2.34% improvement over the strongest baseline.
Reduced execution cost by 92% compared to previous models.
In ensemble setting, accuracy increased to 74.9%.

Resumen

Large language models (LLMs) have achieved strong performance on natural language to SQL (NL2SQL) tasks, but their practical effectiveness depends on tuning a complex pipeline of interacting components. Real-world deployments must navigate a critical trade-off between execution accuracy and monetary cost, a factor that has been largely overlooked by prior work focused primarily on maximizing accuracy. Navigating this trade-off is non-trivial: the ideal configuration of components (e.g., LLM, prompting strategy, schema linking) is not only interdependent but also highly sensitive to the target database schema. This creates a challenging, schema-aware configuration tuning problem that lacks a systematic solution. We present PRISM, a framework that systematically identifies high-accuracy, cost-efficient NL2SQL configurations tailored to each schema. Adopting an optimize-then-deploy strategy, PRISM first uses cost-aware Bayesian Optimization in an offline phase to efficiently explore the configuration space and curate a pool of high-performing pipelines. In an online phase, it deploys these configurations either as a single, cost-effective candidate or as an ensemble to maximize accuracy. Experiments on the BIRD benchmark demonstrate that PRISM achieves 69.48% execution accuracy in the single-candidate setting, improving accuracy by 2.34% over the strongest baseline while reducing cost by 92%. In the ensemble setting, PRISM boosts accuracy further to 74.9%.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo