Large language model training still treats text tokens as the main reusable unit of learning. If a training interval has already transformed parameters from θt to θt+k, then the interval also emits a causal object: the state transition that the optimizer actually bought. This paper introduces update objects, compact artifacts that replay useful training movement without replaying all source tokens. I study two concrete objects in small conventional decoder-only Transformers: sparse checkpoint delta replay objects and optimizer transaction capsules. All target arms start from the same θt and are compared against raw token replay, gradient alignment selected replay, checkpoint distillation, no-source continuation, warmstart upper-bound controls, and shuffled-object ablations. Sparse checkpoint deltas are the strongest result: across two model widths, two text shards, compression sweeps, pessimistic build-cost accounting, and a five-seed d96 paper-core gate, they improve durable held-out cross-entropy progress per paid unit by 2.45–2.90× over the best non-object replay control. A more selective secondary object also survives: optimizer transaction capsules beat raw replay by 1.78× in the same d96 five-seed paper-core gate, while mean-gradient, layerwise-gradient, alignment-weighted, and sign-consensus capsules fail or remain weak. These results do not claim production-scale speedup. They support a narrower thesis: previous training runs can produce reusable transition objects that are empirically different from data selection, distillation, checkpoint warm starts, or parameter-efficient adaptation
Julio Jose Lena (Thu,) studied this question.