Large language models (LLMs) have achieved strong performance on natural language to SQL (NL2SQL) tasks, but their practical effectiveness depends on tuning a complex pipeline of interacting components. Real-world deployments must navigate a critical trade-off between execution accuracy and monetary cost, a factor that has been largely overlooked by prior work focused primarily on maximizing accuracy. Navigating this trade-off is non-trivial: the ideal configuration of components (e.g., LLM, prompting strategy, schema linking) is not only interdependent but also highly sensitive to the target database schema. This creates a challenging, schema-aware configuration tuning problem that lacks a systematic solution. We present PRISM, a framework that systematically identifies high-accuracy, cost-efficient NL2SQL configurations tailored to each schema. Adopting an optimize-then-deploy strategy, PRISM first uses cost-aware Bayesian Optimization in an offline phase to efficiently explore the configuration space and curate a pool of high-performing pipelines. In an online phase, it deploys these configurations either as a single, cost-effective candidate or as an ensemble to maximize accuracy. Experiments on the BIRD benchmark demonstrate that PRISM achieves 69.48% execution accuracy in the single-candidate setting, improving accuracy by 2.34% over the strongest baseline while reducing cost by 92%. In the ensemble setting, PRISM boosts accuracy further to 74.9%.
Kakkar et al. (Thu,) studied this question.