March 3, 2026Open Access

Advancing RTL simulation and enabling hardware decompilation via HPC and PL Techniques

Key Points

Real-time logic (RTL) simulation achieves significant throughput of 2.7 million instructions per second (MIPS) using the Metro-MPI tool.
The use of 1000+ physical cores ensures efficient scaling and parallelism in simulations of transistor-scale designs.
The hardware decompilation project focuses on recovering higher-level programming abstractions from low-level netlists.
Findings may enable faster simulations and improved understanding and analysis of integrated circuits.

Abstract

This talk will describe two adjacent projects from our team. The first is the application of high-performance computing (HPC) techniques to enable scalable RTL simulation for 10B transistor-scale designs. Our tool, Metro-MPI, exploits the natural boundaries present in chip designs (such as latency-insensitive interfaces) to partition RTL simulations and HPC techniques to extract parallelism. Our implementation of Metro- MPI in OpenPiton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, simulated on 1000+ physical cores. The second project introduces the problem of hardware decompilation, analysing a low-level artifact (a netlist) in order to recover higher-level programming abstractions, and using those abstractions to generate code written in an HDL. To start attacking this problem, we focus on hardware loop rerolling, identifying repeated logic in netlists (such as would be synthesized from loops in the original HDL code) and rerolling them into syntactic loops in the recovered HDL code. This enables not only faster simulation, but also opens opportunities for transpilation between HDLs, compaction of netlists, understanding/analysis of netlists, and more.

Advancing RTL simulation and enabling hardware decompilation via HPC and PL Techniques

Key Points

Abstract

Cite This Study