Attention Mechanism

Attention Mechanism #

Attention is a deep learning mechanism that allows a model to focus on the most relevant parts of an input sequence when producing an output.

Instead of compressing the whole input into one fixed vector, attention computes a weighted combination of useful information.

Key takeaway:
Attention answers a simple question:
For the current prediction, which input tokens should the model focus on most?

Traditional encoder-decoder RNN models compress the full input sequence into one context vector.

A transformer is a neural network architecture that uses attention as its main mechanism for processing sequences.

Unlike RNNs, transformers do not process tokens one by one.

They process many tokens in parallel and use self-attention to learn relationships between tokens.

is an architecture of neural networks
based on the multi-head attention mechanism
text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table