Attention Mechanism
Attention Mechanism #
Attention is a deep learning mechanism that allows a model to focus on the most relevant parts of an input sequence when producing an output.
Instead of compressing the whole input into one fixed vector, attention computes a weighted combination of useful information.
Key takeaway:
Attention answers a simple question:For the current prediction, which input tokens should the model focus on most?
- Queries, Keys, and Values
- Attention Pooling by Similarity
- Attention Pooling via Nadaraya–Watson Regression
- Attention Scoring Functions
- Dot Product Attention
- Convenience Functions
- Scaled Dot Product Attention
- Additive Attention
- Bahdanau Attention Mechanism
- Multi-Head Attention
- Self-Attention
- Positional Encoding
Why Attention Is Needed ☆ #
Traditional encoder-decoder RNN models compress the full input sequence into one context vector.