What question did this study set out to answer?

The research aims to develop VerifEval, a pipeline to evaluate AI-generated hardware verification environments.

May 7, 2026Open Access

VerifEval: End-to-End Evaluation of AI-Generated Hardware Verification with Lint, Simulation Coverage, Trace Coverage, and Formal-Driven Mutation

Key Points

The research aims to develop VerifEval, a pipeline to evaluate AI-generated hardware verification environments.
Developed an end-to-end evaluation pipeline named VerifEval.
Measured metrics including static quality, structural coverage, and mutation sensitivity.
Evaluated multiple large language model baselines with five OpenCores designs.
Identified significant gaps in planning and completeness of AI-generated verification environments.
Demonstrated that structural coverage and verification quality are complementary metrics.

Abstract

This work presents VerifEval, an end-to-end evaluation pipeline for AI-generated hardware verification environments. VerifEval measures static quality, executable fidelity, structural coverage, trace-based coverage, and mutation sensitivity across SystemVerilog/UVM and cocotb/pyuvm testbenches. We evaluate multiple large language model baselines on five OpenCores designs and show that structural coverage and verification quality are complementary metrics, with significant gaps remaining in planning and completeness.

VerifEval: End-to-End Evaluation of AI-Generated Hardware Verification with Lint, Simulation Coverage, Trace Coverage, and Formal-Driven Mutation

Key Points

Abstract

Cite This Study

Also Consider

Also Consider