Recent advances in foundation models have motivated hybrid programming that integrates natural language descriptions, formal specifications, and executable code. A critical challenge in such systems lies in achieving semantic alignment across heterogeneous representations at different abstraction levels. This challenge is particularly pressing in programmatic reinforcement learning (PRL) for robotics, where long-horizon, sparse-reward tasks demand tight coordination between symbolic reasoning and continuous control. Existing approaches either rely on manually engineered symbolic representations or on LLM-generated plans that might be untrustworthy or infeasible to execute, leaving such fundamental gaps unaddressed. We present a closed-loop, counterexample-guided synthesis framework that unifies LLM-based planning, formal verification, and differentiable behavior tree (BT) synthesis for neuro-symbolic policy learning. The framework first converts natural language task descriptions into PDDL, then generates valid high-level plans via a Guess-Check-Critique loop that interleaves LLM generation with SMT-based verification. We automatically compile symbolic abstractions in verified plans into parameterized termination conditions, and co-optimize them with low-level policies to enforce semantic consistency. The learned sub-policies are composed into an integrated BT and fine‑tuned to ensure task-level executability. Furthermore, we perform closed-loop iterations of abstractions and compilations using feedback from verification and learning, while incrementally building a reusable skill library for efficient knowledge transfer. Experiments on challenging long-horizon robotic tasks show that our method exceeds state-of-the-art methods while providing interpretability and generalization.
Huang et al. (Mon,) studied this question.