Systematic analysis of attention memory patterns in transformer-based large language models, examining head specialization, attention sink phenomena, information density gradients across layers, and key-value redundancy patterns that inform cache compression strategies.
Oleh Ivchenko (Thu,) studied this question.