A key challenge in reinforcement learning is enabling agents to generalize their experiences, applying knowledge gained in one environment to new and varied contexts. Generalizability is essential for success in real-world applications, where agents must adapt to distribution shifts and contextual variations. In this work, we propose a novel framework that integrates visual domain randomization with multimodal foundation models to improve the robustness and adaptability of reinforcement learning agents. This integration allows agents to learn policies that are resilient to environmental changes and visual discrepancies. We evaluate our method in the MiniGrid benchmark, including the unseen test environment (DistShift1), where it achieves a mean return of 0.85, outperforming the Proximal Policy Optimization baseline (0.32). These results show the effectiveness of our framework in addressing distribution shift and highlight its potential for real-world RL applications.
Salhab et al. (Sun,) studied this question.