The rapid proliferation of reasoning-distilled large language models (LLMs) relies on the premise that Supervised Fine-Tuning (SFT) on the reasoning traces of Reinforcement Learning (RL) teachers transfers causal verification capabilities to smaller models. In this work, we empirically challenge this assumption. We introduce Mid-Thought State Perturbation (MTSP), a dynamic evaluation protocol that forcefully injects adversarial arithmetic errors directly into models' active reasoning traces. Evaluating across True RL Teachers (DeepSeek-R1, OpenAI o3-mini) and distilled SFT families (Qwen, Llama) on the GSM8K benchmark, we identify the Verifier Gap. While RL Teachers actively catch and recover from injected errors up to 90. 2% of the time, SFT-distilled students frequently bypass corrupted logic to hallucinate the correct final answer. Crucially, we demonstrate a Multi-Family Negative Scaling Law: as student models scale, their rate of unfaithful reasoning paradoxically worsens, reaching 47. 8% in Qwen-14B and 44. 8% in Llama-70B (p < 10^-11). Through Contextual Amnesia, Logit Lens probing, and semantically void filler-token ablations, we explain this scaling failure via Syntactic-Logical Divergence. While target answers exist in the larger models' top-10 latent probabilities 94. 1% of the time before reasoning begins, ablation of the Chain-of-Thought causes their accuracy to collapse to 33. 9%. Our findings mechanically prove that SFT reasoning models decouple computation from logic, utilizing the explanatory trace not as a verified causal sequence, but as a performative "dummy scratchpad" to purchase sequence FLOPs without verifying intermediate logical steps. This exposes a fundamental limitation in current distillation paradigms, establishing that scaling SFT alone cannot safely replicate the internal verification mechanisms of true RL models.
Ayush Anand (Tue,) studied this question.