What question did this study set out to answer?

The aim is to enhance the testing of SMT solvers by using historical inputs and LLMs for generating test formulas.

February 11, 2026

Testing like Mad Libs: Fuzzing SMT Solvers with Historical Unusual Inputs Empowered by LLMs

Key Points

The aim is to enhance the testing of SMT solvers by using historical inputs and LLMs for generating test formulas.
Developed MadFuzz to synthesize test inputs from historical bug-triggering examples
Implemented a feedback mechanism utilizing LLMs for optimized prompts
Assessed the capability of MadFuzz using Z3, cvc5, and Bitwuzla SMT solvers
MadFuzz identified 20 confirmed real bugs in SMT solvers, with 19 already fixed
Outperformed existing SMT solver fuzzers in both code coverage and bug detection

Abstract

Ensuring the correctness of Satisfiability Modulo Theory (SMT) solvers is of paramount importance, as it serves as the cornerstone of a broad spectrum of critical software engineering practices. Consequently, various methodologies have been proposed to test SMT solvers, including recent advances that utilize Large Language Models (LLMs) to generate input formulas for SMT solver fuzzing. However, directly employing LLMs to craft SMT formulas will result in an abundance of invalid inputs. Furthermore, our recent study demonstrated the effectiveness of using historical bug-triggering inputs to guide the generation of test formulas, as these inputs help exercise solvers’ deep states. Building on these insights, we present MadFuzz , an enhanced approach that leverages the capabilities of LLMs to ingeniously synthesize test formulas by completing the skeletons of historical bug-triggering inputs. Moreover, MadFuzz incorporates a feedback mechanism based on solvers’ behavior, which consists of LLM-guided prompt optimization and sampling-based seed input selection. To evaluate the effectiveness of MadFuzz , we conducted a comprehensive assessment using three cutting-edge SMT solvers: Z3, cvc5, and Bitwuzla. Our results demonstrate that MadFuzz has identified 20 confirmed real bugs in the solvers, 19 of which have already been addressed by developers. Additionally, our experiments demonstrate that MadFuzz surpasses existing SMT solver fuzzers in code coverage and bug detection capability.

KI fragen

Bookmark

KI fragen

Bookmark

Testing like Mad Libs: Fuzzing SMT Solvers with Historical Unusual Inputs Empowered by LLMs

Key Points

Abstract

Cite This Study