Large language models are increasingly deployed as agents. They write code that executes, run database migrations, navigate browsers, and operate on production systems. Their training objective, next-token prediction, contains no signal about what their actions actually do. A code agent that proposes DROP TABLE has not modeled the resulting database state. It has predicted a plausible token sequence. We argue this is the central reliability gap in current agent systems, and that closing it requires giving agents an internal forward model: given a candidate action, predict the outcome before executing. We propose the Action-Conditioned Latent Structural Causal Model (AC-LSCM) as one such mechanism. AC-LSCM maintains a small set of latent factors related by a learned sparse directed acyclic graph (DAG), and implements actions as structural interventions in the sense of Pearl's do-operator rather than as context concatenation. We evaluate AC-LSCM on synthetic structural causal models and report two findings. First, on a safety-critical agent planning task, AC-LSCM reduces safety violations by roughly 36x relative to a Transformer baseline (mean 0.005 versus 0.180 across 13 seeds). Twelve of those 13 seeds produce zero safety violations. An attribution control confirms that the result is driven by the architectural design, not by data volume. Second, the architecture as originally specified is over-engineered. Ablations show the do-operator and the abduction loop carry the result, while the NOTEARS DAG constraint and a contrastive hinge term are net-negative interventions that we recommend removing. We report the negative results in full and propose a simplified follow-up architecture. Training is unstable at the scales tested: roughly a third of seeds fail to produce a usable planner despite normal training-time MSE. Even on those failing seeds, safety violation rates remain below the Transformer baseline. Code, configs, and per-seed result JSONs accompany the preprint. All experiments ran on a single NVIDIA Tesla T4 in fp32.
Mallesh Madapathi (Mon,) studied this question.