Transformer

Transformer #

  • is an architecture of neural networks

  • based on the multi-head attention mechanism

  • text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table

  • takes a text sequence as input and produces another text sequence as output

  • foundation for modern Large Language Models (LLMs) like ChatGPT and Gemini

  • Transformer architecture

  • Model, Positionwise Feed-Forward Networks, Residual Connection and Layer Normalization

  • Encoder and Decoder

  • Transformer block

  • Residual view for transformer

  • Transformers for Vision

  • Model, Patch Embedding, Vision Transformer Encoder, Training and Evaluation

  • Large-Scale Pretraining with Transformers

  • Encoder-Only, Encoder–Decoder, Decoder-Only

  • Scalability


Reference #

  • Dive into deep learning. Cambridge University Press.. (Ch11
  • R4 - Ch 10.7

Home | Deep Learning