Translating natural language into SQL is essential for intuitive database access, yet open-source small language models (SLMs) still lag behind larger systems when faced with complex schemas and tight context windows. This paper introduces a two-phase workflow designed to enhance the Text-to-SQL capabilities of SLMs. Phase 1 (offline) transforms the database schema into a graph, partitions it with Louvain community detection, and enriches each component in a cluster with metadata, relationships, and sample rows. Phase 2 (at runtime) selects the relevant tables, generates SQL queries, and iteratively refines the SQL through an execution-driven feedback loop until the query executes successfully. Evaluated on the Spider test set, our pipeline raises Qwen-2.5-Coder-14B to 86.2% Execution Accuracy (EX), surpassing its zero-shot baseline and outperforming all contemporary SLM + ICL approaches and narrowing the gap to GPT-4-based systems all while running on consumer-grade hardware. Ablation studies confirm that both schema enrichment and self-correction contribute significantly to the improvement. The study concludes that this workflow provides a practical methodology for deploying resource-efficient open-source SLMs in Text-to-SQL applications, effectively mitigating common challenges. An open-source implementation is released to support further research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Le Gia Kiet
Le Quoc Khanh
Nguyen Minh Nhut
CTU Journal of Innovation and Sustainable Development
Building similarity graph...
Analyzing shared references across papers
Loading...
Kiet et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68f3eb011cfc5ad53f290961 — DOI: https://doi.org/10.22144/ctujoisd.2025.058