Research article: Grouped-Query Attention — Cache-Efficient Architecture Design
Oleh Ivchenko (Tue,) studied this question.