What question did this study set out to answer?

This work aims to enhance how artificial agents learn and act by integrating program synthesis with reinforcement learning.

February 5, 2026Open Access

From reasoning to acting: A program synthesis approach to reinforcement learning

Key Points

This work aims to enhance how artificial agents learn and act by integrating program synthesis with reinforcement learning.
Introduced GCRL-LTL for generating correct behavioral plans based on temporal logic specifications.
Developed π-PRL and π-HPRL for discovering optimal program structures in agent policies through a differentiable search.
Created PREFORL to learn effective behaviors from offline datasets without direct value function estimation.
Demonstrated zero-shot generalization capabilities in agents by using high-level reasoning frameworks.
Showcased improved sample efficiency and verifiability in policies generated through the new synthesis approaches.
Established agents that are both performant and transparent in their reasoning and planning.

Abstract

The pursuit of artificial agents that can autonomously learn and act in complex environments is a central ambition of Artificial Intelligence (AI). Within this pursuit, Reinforcement Learning (RL) has emerged as a core computational framework for creating agents that can bridge the gap from high-level reasoning to effective real-world acting. The dominant Deep RL paradigm, however, faces fundamental challenges with policy opacity, which precludes formal verification, and compositional brittleness, which leads to poor sample efficiency. This dissertation argues that the principles of Program Synthesis provide a unified framework to address these challenges by creating policies that are structured, verifiable, and grounded in data. We develop this thesis through three interconnected contributions that span the spectrum from formal logic to learned behavior. First, to enable high-level reasoning, we introduce GCRL-LTL, a deductive synthesis framework where agents generate behavioral plans that are provably correct with respect to complex temporal logic specifications, enabling zero-shot generalization by decoupling high-level planning from low-level control. To translate these plans into structured policies, we then develop π-PRL and π-HPRL, a search-based synthesis approach that automatically discovers the optimal, interpretable program structure for an agent's policy through a differentiable relaxation of the program search space. Finally, to ensure these policies are grounded in experience and can learn to act robustly, we introduce PREFORL, an inductive synthesis method that learns effective behaviors from static, offline datasets of examples by bypassing direct value function estimation via a novel contrastive learning objective. Collectively, this work demonstrates a complete pathway from reasoning to acting within RL, showcasing how the diverse strengths of program synthesis can be leveraged to create a new generation of autonomous agents that are not only high-performing, but also logical in their reasoning, transparent in their planning, and robustly grounded in their actions, representing a meaningful step towards more reliable and Trustworthy AI.

From reasoning to acting: A program synthesis approach to reinforcement learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider