Multi-Head Attention

Attention Mechanism

Attention Mechanism #

Attention is a deep learning mechanism that allows a model to focus on the most relevant parts of an input sequence when producing an output.

Instead of compressing the whole input into one fixed vector, attention computes a weighted combination of useful information.

Key takeaway:
Attention answers a simple question:

For the current prediction, which input tokens should the model focus on most?

  • Queries, Keys, and Values
  • Attention Pooling by Similarity
  • Attention Pooling via Nadaraya–Watson Regression
  • Attention Scoring Functions
  • Dot Product Attention
  • Convenience Functions
  • Scaled Dot Product Attention
  • Additive Attention
  • Bahdanau Attention Mechanism
  • Multi-Head Attention
  • Self-Attention
  • Positional Encoding

Why Attention Is Needed ☆ #

Traditional encoder-decoder RNN models compress the full input sequence into one context vector.