This paper presents a new reinforcement learning (RL) -driven inverse design strategy that leverages the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for the efficient optimization of photonic structures, with a focus on metamaterial absorbers (MAs) and cross polarization converters (CPC) as demonstrative applications. Unlike conventional heuristic or surrogate-based optimization methods, the proposed RL approach autonomously learns the optimal geometric configuration through direct interaction with the simulation environment, without requiring gradient information or pre-built surrogate models. Initially, the TD3 model is used to optimize the geometric parameters of an existing MA based on an L-shaped resonator, significantly enhancing its absorption performance to be greater than 90% in the frequency range from 12. 2 GHz to 22. 4 GHz in only 23 iterations. Then, a novel CPC design is proposed, optimized using the same RL framework, and subsequently fabricated. The fabricated structure achieves high polarization conversion ratio (PCR) above 90% over a wide frequency range from 11. 8 GHz to 24. 2 GHz, covering the full Ku band and most of the K band. Furthermore, over most of the frequency range, the converter maintains strong performance under oblique incidence, with PCR levels above 80% up to an angle of 50 ^. These results validate the effectiveness of the TD3-based RL framework in discovering high-performance and fabrication-ready designs, while also establishing a scalable and generalizable optimization paradigm for advanced photonic devices.
Mahmoud et al. (Sat,) studied this question.