The lack of physical constraints in metasurface inverse design often leads to inefficient exploration of high-dimensional spaces and to designs difficult to realize physically. We propose an automated inverse design framework integrating physical theory with reinforcement learning. The Equivalent Circuit Model (ECM) not only provides a high-performance initial structure but also reconstructs the solution space by compressing infinite geometric space into a finite, physically meaningful parameter domain. These physical constraints are maintained throughout optimization, ensuring all explorations remain within physically realizable bounds. A real-time closed loop between the RL agent and a full-wave simulator is established hierarchically. Q-learning optimizes macroscopic parameters, followed by Hierarchical Proximal Policy Optimization (HPPO) for microscopic topological refinement. Physical and manufacturing constraints are embedded into both the action space and reward function, improving simulation-to-fabrication consistency. An ultra-wideband absorbing metasurface designed with this framework achieves over 90% absorption from 2.9 to 17.2 GHz (relative bandwidth 143%) under normal incidence, with excellent polarization insensitivity and continuous absorption up to 60 oblique incidence. Measured results agree well with simulations. This theory-guided, hierarchically optimized, constraint-embedded framework provides a new pathway for metasurface inverse design. • A theory-guided inverse design framework is established, where physical theory serves as a solution space reconstructor rather than merely an initial structure generator. • Macroscopic parameter optimization via Q-learning and microscopic topological refinement via hierarchical proximal policy optimization (HPPO) are synergistically integrated within a single reinforcement learning loop. • Manufacturing constraints including minimum feature size and connectivity are directly embedded into the reward function, ensuring simulation-to-fabrication consistency without post-processing. • The proposed hierarchical strategy effectively reduces search complexity and escapes local optima in high-dimensional discrete design spaces, achieving a 12% bandwidth improvement beyond conventional topology optimization. • An ultra-wideband absorbing metasurface is designed and fabricated, exhibiting over 90% absorption from 2.9 to 17.2 GHz (relative bandwidth 143%) with excellent polarization insensitivity and stable performance up to 60 oblique incidence.
Shunkai et al. (Wed,) studied this question.