What question did this study set out to answer?

The aim is to automate and optimize the deployment of deep neural networks on hardware with limited resources while maintaining performance and accuracy.

February 11, 2026

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

Key Points

The aim is to automate and optimize the deployment of deep neural networks on hardware with limited resources while maintaining performance and accuracy.
Developed a unified framework for codifying optimization strategies.
Integrated DNN optimization techniques with high-level synthesis-based metaprogramming.
Utilized Bayesian optimization for design space exploration across multiple design flows.
Implemented a cross-stage optimization search approach.
Achieved up to a 92% reduction in DSP usage and 89% reduction in LUT usage.
Preserved inference accuracy across optimized networks.
Reduced optimization time by 15.6-fold compared to traditional grid search.

Abstract

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy, and resource efficiency. Deploying DNNs on such platforms involves addressing the significant challenge of balancing performance, resource usage (e.g., DSPs and LUTs), and inference accuracy, which often requires extensive manual effort and domain expertise. Our novel approach addresses two core issues: (i) encoding custom optimization strategies and (ii) enabling cross-stage optimization search. In particular, our proposed framework seamlessly integrates programmatic DNN optimization techniques with high-level synthesis (HLS)-based metaprogramming, leveraging advanced design space exploration (DSE) strategies like Bayesian optimization to automate both top-down and bottom-up design flows. Hence, we reduce the need for manual intervention and domain expertise. In addition, the framework introduces customizable optimization, transformation, and control blocks to enhance DNN accelerator performance and resource efficiency. We further formalize a cross-stage, constrained Bayesian optimization procedure that couples predicate–action bottom-up feedback (via BRANCH) with FORK/REDUCE order search, enabling automated selection, ordering, and tuning across software and HLS tasks. Experimental results demonstrate up to a 92% DSP and 89% LUT usage reduction for select networks, while preserving accuracy, along with a 15.6-fold reduction in optimization time compared to grid search. These results highlight the potential for automating the generation of resource-efficient DNN accelerator designs with minimal effort, resulting in large resource savings with bounded exploration cost.

KI fragen

Bookmark