The pursuit of artificial agents that can autonomously learn and act in complex environments is a central ambition of Artificial Intelligence (AI). Within this pursuit, Reinforcement Learning (RL) has emerged as a core computational framework for creating agents that can bridge the gap from high-level reasoning to effective real-world acting. The dominant Deep RL paradigm, however, faces fundamental challenges with policy opacity, which precludes formal verification, and compositional brittleness, which leads to poor sample efficiency. This dissertation argues that the principles of Program Synthesis provide a unified framework to address these challenges by creating policies that are structured, verifiable, and grounded in data. We develop this thesis through three interconnected contributions that span the spectrum from formal logic to learned behavior. First, to enable high-level reasoning, we introduce GCRL-LTL, a deductive synthesis framework where agents generate behavioral plans that are provably correct with respect to complex temporal logic specifications, enabling zero-shot generalization by decoupling high-level planning from low-level control. To translate these plans into structured policies, we then develop π-PRL and π-HPRL, a search-based synthesis approach that automatically discovers the optimal, interpretable program structure for an agent's policy through a differentiable relaxation of the program search space. Finally, to ensure these policies are grounded in experience and can learn to act robustly, we introduce PREFORL, an inductive synthesis method that learns effective behaviors from static, offline datasets of examples by bypassing direct value function estimation via a novel contrastive learning objective. Collectively, this work demonstrates a complete pathway from reasoning to acting within RL, showcasing how the diverse strengths of program synthesis can be leveraged to create a new generation of autonomous agents that are not only high-performing, but also logical in their reasoning, transparent in their planning, and robustly grounded in their actions, representing a meaningful step towards more reliable and Trustworthy AI.
Wenjie Qiu (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: