Transformer

Transformer #

  • is an architecture of neural networks

  • based on the multi-head attention mechanism

  • text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table

  • takes a text sequence as input and produces another text sequence as output

  • foundation for modern Large Language Models (LLMs) like ChatGPT and Gemini


Home | Deep Learning