Attention Mechanism

Attention Mechanism #

Attention is a deep learning mechanism that allows a model to focus on the most relevant parts of an input sequence when producing an output.

Instead of compressing the whole input into one fixed vector, attention computes a weighted combination of useful information.

Key takeaway:
Attention answers a simple question:
For the current prediction, which input tokens should the model focus on most?

Traditional encoder-decoder RNN models compress the full input sequence into one context vector.