Foundation Model #
AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.
are large deep learning neural networks
are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).
Contain millions or billions of parameters.
designed to perform a broad range of general tasks
designed for general-purpose intelligence, not a single task.
acts as base models for building specialised AI applications
uses transfer learning, allowing knowledge learned from one task to be reused for others.
can be adapted using fine-tuning or prompting.
Foundation models are trained once on diverse data and adapted many times to solve different tasks.
flowchart LR
%% -------------------
%% Data Sources
%% -------------------
subgraph DATA_GROUP["Data"]
TEXT[Textual Data]
STRUCT[Structured Data]
SPEECH[Speech]
SIGNALS[3D Signals]
IMAGES[Images]
end
%% -------------------
%% Foundation Model
%% -------------------
subgraph FM_GROUP["Foundation Model"]
FM[Pre-trained Model]
end
%% -------------------
%% Adaptation Layer
%% -------------------
subgraph ADAPT_GROUP["Adaptation"]
FT[Fine-tuning]
PROMPT[Prompting]
RAG[RAG]
end
%% -------------------
%% Tasks
%% -------------------
subgraph TASKS_GROUP["Tasks"]
IE[Information Extraction]
OR[Object Recognition]
IF[Instruction Following]
IC[Image Captioning]
SA[Sentiment Analysis]
QA[Question Answering]
end
%% -------------------
%% Connections
%% -------------------
TEXT -->|Training| FM
STRUCT -->|Training| FM
SPEECH -->|Training| FM
SIGNALS -->|Training| FM
IMAGES -->|Training| FM
FM -->|Adaptation| FT
FM -->|Adaptation| PROMPT
FM -->|Adaptation| RAG
FT --> IE
FT --> SA
PROMPT --> IF
PROMPT --> QA
RAG --> IE
RAG --> QA
RAG --> IC
%% -------------------
%% Styling
%% -------------------
style TEXT fill:#C8E6C9
style STRUCT fill:#C8E6C9
style SPEECH fill:#C8E6C9
style SIGNALS fill:#C8E6C9
style IMAGES fill:#C8E6C9
style FM fill:#90CAF9
style FT fill:#BBDEFB
style PROMPT fill:#BBDEFB
style RAG fill:#BBDEFB
style IE fill:#FFCCBC
style OR fill:#FFCCBC
style IF fill:#FFCCBC
style IC fill:#FFCCBC
style SA fill:#FFCCBC
style QA fill:#FFCCBC
style DATA_GROUP stroke:none,fill:transparent
style FM_GROUP stroke:none,fill:transparent
style ADAPT_GROUP stroke:none,fill:transparent
style TASKS_GROUP stroke:none,fill:transparent
Traditional ML vs Foundation Models #
| Feature | Traditional ML Models | Foundation Models |
|---|---|---|
| Training data | Small, task-specific datasets | Massive, diverse datasets |
| Model size | Small to medium | Very large (millions/billions of parameters) |
| Purpose | Single, specific task | General-purpose |
| Reusability | Limited | High |
| Training approach | Train from scratch per task | Pre-train once, adapt many times |
| Transfer learning | Rare or minimal | Core design principle |
| Examples | Linear Regression, SVM, Decision Trees | GPT, BERT, CLIP |
Why Foundation Models Are Different #
- Traditional ML models are built for one problem at a time.
- Foundation models learn general representations of language, vision, or sound.
- This enables them to be reused across many applications with minimal additional training.
Areas of Application #
- Natural Language Processing (NLP)
- Computer Vision
- Speech Recognition
- Multimodal AI (text + images + audio)
Common Foundation Model Examples #
GPT (Generative Pre-trained Transformer) #
- Focus: Text generation and understanding
- Tasks: Chatbots, summarisation, translation, code generation
- Example: ChatGPT
BERT (Bidirectional Encoder Representations from Transformers) #
- Focus: Language understanding
- Tasks: Search, question answering, sentiment analysis
- Used heavily in search engines
CLIP (Contrastive Language–Image Pretraining) #
- Focus: Text + Image understanding
- Tasks: Image classification using text prompts
- Enables multimodal AI systems
Foundation Models in the AI Stack #
flowchart TB
DATA[Massive Datasets]
FM[Foundation Model]
TASK1[NLP Tasks]
TASK2[Vision Tasks]
TASK3[Speech Tasks]
DATA --> FM
FM --> TASK1
FM --> TASK2
FM --> TASK3
How Foundation Models Are Used #
- Pre-trained once on large datasets
- Adapted using:
- Fine-tuning
- Prompt engineering
- Task-specific heads
- Serve as the backbone for modern AI systems, including Large Language Models (LLMs)
FM Summary #
- Foundation models are general-purpose AI models.
- They power modern systems like LLMs and multimodal AI.
- They reduce cost, time, and complexity in AI development.
- They represent a major shift from task-specific ML to scalable intelligence.