Engineering electron correlations in quantum dot arrays demand navigation of high-dimensional, non-convex parameter spaces, where hole doping fundamentally alters the physics. We present a rigorous comparative study of two control paradigms for the 1-hole of half-filled Hubbard model: (i) systematic physics-guided design and (ii) autonomous deep reinforcement learning (RL) with geometry-aware neural architectures. While systematic analysis reveals key design principles—such as field-induced localization for trapping the mobile hole—it is computationally intractable for optimization. We demonstrate that an autonomous RL agent, benchmarked across five 3D lattices (tetrahedron to FCC), achieves human-competitive accuracy (R2 0.97) and 95.5% success on held-out tasks. Critically, the RL agent achieves this performance with 103−4× greater sample efficiency than grid search and outperforms other black-box optimization methods. Transfer learning demonstrates 91% few-shot generalization to unseen geometries. This work establishes autonomous RL as a viable, highly efficient framework for rapid optimization and non-obvious strategy discovery in complex quantum systems.
Dwivedi et al. (Sun,) studied this question.