What does this research mean for the field?

Replacing raw reward maximization with an access-and-dwell objective through the Conservative Lapse-Action Planning (CLAP) framework prevents collapse into unsafe traps and reduces transition-error exposure in latent trajectory optimization. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The central aim is to develop a safe planning framework that optimizes latent trajectories while considering various constraints.

June 1, 2026Open Access

Conservative Lapse-Action Planning: A Variational Access-and-Dwell Framework for Safe Latent Trajectory Optimization

Key Points

The central aim is to develop a safe planning framework that optimizes latent trajectories while considering various constraints.
Introduced Conservative Lapse-Action Planning (CLAP) framework for latent trajectory optimization
Developed several variants including Adaptive DU-CLAP and Phase-Adaptive Learned-Gate A-CLAP
Conducted experiments in latent planning environments to assess the performance of these methods.
CLAP leads to reduced risk of unsafe traps in long-horizon planning environments, demonstrating safety advantages.
Variants like Phase-Adaptive Learned-Gate A-CLAP show improved stability and reduced transition error exposure during execution.

Abstract

Safe long-horizon planning requires more than maximizing reward at isolated states. In reinforcement learning, robotics, world-model planning, and embodied AI, the safety of a policy often depends on how an agent moves through state space, how uncertain its transitions are, whether it enters unreliable regions, and whether it remains stably within desirable operating regimes. This paper introduces Conservative Lapse-Action Planning (CLAP), a variational framework for safe latent trajectory optimization. CLAP replaces raw reward maximization with an access-and-dwell objective: an agent should reach the best admissible safe high-lapse region, under explicit speed, acceleration, uncertainty, safety, and out-of-distribution constraints, and then dwell there stably. The framework defines a conservative lapse field combining value, uncertainty, safety cost, and distributional reliability into one scalar. Under compactness, continuity, speed-margin, finite-access, and lapse-gap assumptions, the CLAP action is nonnegative, admits minimizers, and concentrates long-horizon minimizers near the best admissible target set. We develop CLAP, RRLA, DU-CLAP, Adaptive DU-CLAP, A-CLAP, Learned-Gate A-CLAP, and Phase-Adaptive Learned-Gate A-CLAP. Experiments in latent planning environments show reward-only collapse into unsafe traps, long-horizon dwell formation under projected MPC, and reduced transition-error exposure under learned gating. The strongest current theorem object is base CLAP. The strongest research candidate is Phase-Adaptive Learned-Gate A-CLAP.

Conservative Lapse-Action Planning: A Variational Access-and-Dwell Framework for Safe Latent Trajectory Optimization

Key Points

Abstract

Cite This Study