Demystifying GQA - Grouped Query Attention for Efficient LLM Pre-training | Towards Data Science
The variant of multi-head attention powering LLMs like LLaMA-2, Mistral7B, etc.

Source: Towards Data Science
The variant of multi-head attention powering LLMs like LLaMA-2, Mistral7B, etc.