Transformer
Transformer #
A transformer is a neural network architecture that uses attention as its main mechanism for processing sequences.
Unlike RNNs, transformers do not process tokens one by one.
They process many tokens in parallel and use self-attention to learn relationships between tokens.
is an architecture of neural networks
based on the multi-head attention mechanism
text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table