Compiler testing is crucial, as compilers serve as foundational infrastructure in software development. Effective compiler testing necessitates not only the generation of diverse test programs but also the systematic specification of compilation options. While existing research predominantly emphasizes test program diversification, it largely overlooks the strategic selection of compilation options required for comprehensive testing. Compilers typically offer a wide range of fine-grained compilation options that allow for precise control over the compilation process, resulting in a vast combination space. Exhaustive enumeration of all option combinations is computationally infeasible, while the stochastic generation of conflictfree, semantically meaningful options presents significant methodological challenges. To address these limitations, we propose OptFuzz, an innovative compiler testing framework that harnesses the generative capabilities of Large Language Models (LLMs) alongside the effectiveness of historical bug-triggering test programs for a comprehensive exploration of the compilation space. OptFuzz leverages LLMs to extract historical bug-triggering test programs from diverse bug reports, which are demonstrated to be effective in uncovering compiler bugs. Therefore, it overcomes the limitations of existing regex-based extraction techniques. Subsequently, OptFuzz employs a code abstraction extraction method based on intermediate representation (IR) to tackle the constraints of LLM context length limitations. Finally, the extracted IR is fed into LLM to generate effective compilation options for compiler testing. Through extensive experiments on GCC and LLVM, OptFuzz demonstrated superior bug detection capability compared to the random compilation space exploration method and other existing technologies. Notably, OptFuzz discovered 64 new bugs in GCC and LLVM, with 53 confirmed or fixed, highlighting our method’s practical utility. The experimental outcomes also indicate that the IR-based analysis substantially decreases overhead and improves bug detection compared to direct utilization of source code.
Huang et al. (Sat,) studied this question.