Safe long-horizon planning requires more than maximizing reward at isolated states. In reinforcement learning, robotics, world-model planning, and embodied AI, the safety of a policy often depends on how an agent moves through state space, how uncertain its transitions are, whether it enters unreliable regions, and whether it remains stably within desirable operating regimes. This paper introduces Conservative Lapse-Action Planning (CLAP), a variational framework for safe latent trajectory optimization. CLAP replaces raw reward maximization with an access-and-dwell objective: an agent should reach the best admissible safe high-lapse region, under explicit speed, acceleration, uncertainty, safety, and out-of-distribution constraints, and then dwell there stably. The framework defines a conservative lapse field combining value, uncertainty, safety cost, and distributional reliability into one scalar. Under compactness, continuity, speed-margin, finite-access, and lapse-gap assumptions, the CLAP action is nonnegative, admits minimizers, and concentrates long-horizon minimizers near the best admissible target set. We develop CLAP, RRLA, DU-CLAP, Adaptive DU-CLAP, A-CLAP, Learned-Gate A-CLAP, and Phase-Adaptive Learned-Gate A-CLAP. Experiments in latent planning environments show reward-only collapse into unsafe traps, long-horizon dwell formation under projected MPC, and reduced transition-error exposure under learned gating. The strongest current theorem object is base CLAP. The strongest research candidate is Phase-Adaptive Learned-Gate A-CLAP.
Rishabh Ashok Patil (Sat,) studied this question.