Navigating the Attention Landscape: MHA, MQA, and GQA Decoded

Shobhit Agarwal
5 min readJan 4, 2024

Attention mechanisms are the driving force behind many of today’s cutting-edge Large Language Models. They allow these models to focus on relevant parts of an input sequence, like a sentence or document, and extract meaning with astonishing accuracy. But with different flavours of attention popping up, things can get confusing. Today, we’ll explore three key players:

  1. Multi-Head Attention (MHA)
  2. Multi-Query Attention (MQA)
  3. Grouped-Query Attention (GQA)

Whether you’re a complete beginner or someone with a basic understanding of NLP, I tried to make this article designed in a way to provide clear explanations related to MHA, MQA and GQA.

The Attention Spotlight:

  • Attention mechanisms focus on relevant parts of an input sequence like sentences or documents.
  • Imagine a magician pulling a rabbit out of a hat — attention mechanisms selectively “reveal” important information.
  • Attention plays a crucial role in tasks like understanding sentiment, translating languages, and answering questions.
  1. Multi-Head Attention: The Star Player
  • Point 1: MHA uses multiple “heads” to attend to different aspects of the input simultaneously, like a multi-tasking detective.
  • Point 2: Think of it as reading a newspaper with multiple headlines — MHA captures various…

--

--

Shobhit Agarwal
Shobhit Agarwal

Written by Shobhit Agarwal

Data Scientist | Machine Learning | Deep Learning | Generative AI | LLM | Computer Vision | Python | MLOPs | Nvidia | GCP Certified | Azure Fabric Certified

Responses (2)