This work presents VerifEval, an end-to-end evaluation pipeline for AI-generated hardware verification environments. VerifEval measures static quality, executable fidelity, structural coverage, trace-based coverage, and mutation sensitivity across SystemVerilog/UVM and cocotb/pyuvm testbenches. We evaluate multiple large language model baselines on five OpenCores designs and show that structural coverage and verification quality are complementary metrics, with significant gaps remaining in planning and completeness.
Razzaque et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: