We study whether instruction-tuned language-model behaviors have the same robustness profile under controlled weight perturbation. We inject per-tensor Gaussian noise into Llama-3. 1-8B-Instruct and evaluate four capability families. Capabilities degrade heterogeneously (IFEval retains 82. 2% at σ=0. 2, GSM8K retains 45. 1%; p=0. 033). Safety refusal behavior shows qualitatively higher seed-level variance (CV=67. 6% vs <13% for capabilities). A component-level sweep initially appeared to show capability-type dissociation; a modality control shows the stronger result is evaluation-modality dissociation: MMLU log-likelihood scoring retains 47. 4% at σ=2. 0 on layer₂9ₐttention while greedy-generation collapses to 4. 4% with 93. 1% extraction failure. Log-likelihood evaluation cannot certify deployment-relevant generation capability after weight modification. Preprint v2. 1 — companion code at https: //github. com/mohitdak24/perturbation-robustness-profiles
Building similarity graph...
Analyzing shared references across papers
Loading...
Prashi Badkur
Mohit Dak
Columbia University
London Business School
Indian Institute of Technology Bombay
Building similarity graph...
Analyzing shared references across papers
Loading...
Badkur et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6a192e95fab5b468c4417c22 — DOI: https://doi.org/10.5281/zenodo.20403834