Group Query Attention is an interpolated attention mechanism obtained from Multi-Head Attention and Multi Query Attention

Paper Link: https://arxiv.org/abs/2305.13245

Group Query Attention divides query heads into G groups, each of which share a single key and value pair

Group Query Attention shares single key and value head for each group of query heads

Group of query heads

Advantages