This technical note presents a minimal synthetic stress test on local robustness under reduced context interfaces in large language model inference. A controlled dataset of 100 multiple-choice instances is introduced across four evidence regimes: easy local retrieval, distributed evidence integration, conflict-sensitive update resolution, and position-sensitive retrieval. The same model is evaluated under three interfaces: FULL, RECENT, and a deterministic COMPRESSED interface retaining only the first sentence of each block. The main result is a clear regime-dependent dissociation: perfect local robustness is preserved in the easy regime, while strong interface-specific failures emerge outside it. The note is strictly diagnostic and local in scope, and argues that local benchmark robustness under a reduced context interface does not imply global behavioral equivalence across evidence regimes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Danilo Tavella
Building similarity graph...
Analyzing shared references across papers
Loading...
Danilo Tavella (Fri,) studied this question.
www.synapsesocial.com/papers/69c8c2d1de0f0f753b39d481 — DOI: https://doi.org/10.5281/zenodo.19254806
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: