What question did this study set out to answer?

This research aims to investigate how different training methodologies impact attention head specialization in various large language models.

January 22, 2026Open Access

Alignment Robustness Depends More on Training than Architecture: A Cross-Vendor Analysis of Attention Specialization in Large Language Models

Key Points

This research aims to investigate how different training methodologies impact attention head specialization in various large language models.
Systematic empirical study of large language models across eight vendor families
Evaluation of attention head diversity using the Specialization Index (SI)
Comparison of base and instruction-tuned model pairs using standardized protocols
Robustness to alignment-induced specialization loss is linked to training methodology
Models with Sliding Window Attention maintain or increase specialization compared to those without
Grouped Query Attention exhibits significantly higher sensitivity to attention noise than Multi-Head Attention

Abstract

We present a systematic empirical study examining how preference optimization methods (RLHF, DPO) affect attention head specialization across eight vendor families and more than 25 large language model variants. Using a standardized evaluation protocol (bfloat16 precision, three-seed cross-validation, and SHA-256–verified prompts), we quantify attention head diversity via the Specialization Index (SI) and compare base and instruction-tuned model pairs. Main finding: Robustness to alignment-induced specialization loss is strongly associated with training methodology, following a consistent hierarchy: Training Methodology > Sliding Window Attention > Architecture > Scale. Key results: SI reduction pattern: RLHF and DPO reduce SI in most model families lacking architectural protection (LLaMA-3.1: −56.3%; LLaMA-2: −7.95%), whereas models equipped with Sliding Window Attention maintain or increase specialization (Mistral: +4.2%). Architecture-dependent sensitivity: At matched scale, Grouped Query Attention exhibits approximately 5,800× higher sensitivity to random attention noise than Multi-Head Attention (ratio-of-means across three seeds; permutation test, p < 0.05). Training-based robustness: Synthetic training (Phi family) yields scale-invariant specialization (SI ≈ 0.33 across a 10.8× parameter range), and Qwen2 shows no observed recursive degradation within the tested 50-generation window. This release includes 19 documented Jupyter notebooks that support the full experimental pipeline, 27 result JSON files, and command-line tools that enable end-to-end reproducibility.The paper text is released under CC-BY-4.0; accompanying code and tooling are released under the MIT License.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper