This work demonstrates that, under a fully controlled low-data EN→UA NMT protocol (N = 142), deterministic corpus ordering acts as the dominant structural driver within the tested regime. In a single-factor ablation with model, tokenizer, LoRA, seeds, and corpus content held constant, altering only corpus sequencing produces reproducible degradation in chain-level coherence, including boundary amplification (~11.6%), while entropy remains invariant. Under the same protocol, contradiction magnitude (ΔV) does not emerge as a dominant driver. Corpus organization is an active structural variable in low-data NMT — not a preprocessing detail. This finding motivates the layered diagnostic architecture described in the companion series. All results are strictly bounded to the fixed low-data protocol. No generalization beyond this setting is claimed.
Kuzmenko et al. (Fri,) studied this question.