Transformer #
is an architecture of neural networks
based on the multi-head attention mechanism
text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table
takes a text sequence as input and produces another text sequence as output
foundation for modern Large Language Models (LLMs) like ChatGPT and Gemini
Transformer architecture
Model, Positionwise Feed-Forward Networks, Residual Connection and Layer Normalization
Encoder and Decoder
Transformer block
Residual view for transformer
Transformers for Vision
Model, Patch Embedding, Vision Transformer Encoder, Training and Evaluation
Large-Scale Pretraining with Transformers
Encoder-Only, Encoder–Decoder, Decoder-Only
Scalability
Reference #
- Dive into deep learning. Cambridge University Press.. (Ch11
- R4 - Ch 10.7