Contemporary grounded planning agents commonly delegate object selection to external resolvers—APIs, heuristics, and privileged interfaces—a pattern that masks representational failures. We isolate object commitment as a diagnostic variable: two agents with identical perception, world models, and training differ only in whether object arguments are resolved externally (Variant A) or internally (Variant B). This exposes when mean-pooled representations, standard in model-based RL systems like Dreamer and PlaNet, fail at categorical object grounding. We demonstrate four non-intuitive dissociations. First, planning and grounding are orthogonal: the same model achieves 100% multi-step planning success while failing completely (0%) at object selection on identical tasks when entropy increases from 4 to 30+ objects (Archive Dichotomy). Second, the Data Scale Paradox: homogeneous training data causes performance collapse from 100% (300 trajectories) to 0% (500+ trajectories)—more data actively harms performance through statistical gravity that reinforces dominant patterns over task-conditional reasoning. Third, width-only capacity scaling destroys intelligence: 8M parameters (23× increase) underperform the 343K base model, while balanced width+depth scaling (75M, 220× increase) recovers planning but not grounding, revealing a width-to-depth ratio requirement for capability emergence. Fourth, grounding bottlenecks are architectural, not parametric: even 220× scaling cannot overcome categorical failures under high entropy, confirming mean pooling imposes a representational ceiling. These failure modes—semantic drift under statistical gravity, topology-dependent collapse, and entropy-sensitive grounding—parallel pathologies in large language models (hallucinations, entity confusion), suggesting mechanistic connections between filesystem agents and frontier AI systems. Delegation conceals these failures; internal commitment exposes them. The gap is diagnostic.
Shoryavardhaan Gupta (Sun,) studied this question.