Ensuring the correctness of Satisfiability Modulo Theory (SMT) solvers is of paramount importance, as it serves as the cornerstone of a broad spectrum of critical software engineering practices. Consequently, various methodologies have been proposed to test SMT solvers, including recent advances that utilize Large Language Models (LLMs) to generate input formulas for SMT solver fuzzing. However, directly employing LLMs to craft SMT formulas will result in an abundance of invalid inputs. Furthermore, our recent study demonstrated the effectiveness of using historical bug-triggering inputs to guide the generation of test formulas, as these inputs help exercise solvers’ deep states. Building on these insights, we present MadFuzz , an enhanced approach that leverages the capabilities of LLMs to ingeniously synthesize test formulas by completing the skeletons of historical bug-triggering inputs. Moreover, MadFuzz incorporates a feedback mechanism based on solvers’ behavior, which consists of LLM-guided prompt optimization and sampling-based seed input selection. To evaluate the effectiveness of MadFuzz , we conducted a comprehensive assessment using three cutting-edge SMT solvers: Z3, cvc5, and Bitwuzla. Our results demonstrate that MadFuzz has identified 20 confirmed real bugs in the solvers, 19 of which have already been addressed by developers. Additionally, our experiments demonstrate that MadFuzz surpasses existing SMT solver fuzzers in code coverage and bug detection capability.
Sun et al. (Mon,) studied this question.