Foundation Model #

AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.

are large deep learning neural networks
are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).
Contain millions or billions of parameters.
designed to perform a broad range of general tasks
designed for general-purpose intelligence, not a single task.
acts as base models for building specialised AI applications
uses transfer learning, allowing knowledge learned from one task to be reused for others.
can be adapted using fine-tuning or prompting.

Foundation models are trained once on diverse data and adapted many times to solve different tasks.

flowchart LR

    %% -------------------
    %% Data Sources
    %% -------------------
    subgraph DATA_GROUP["Data"]
        TEXT[Textual Data]
        STRUCT[Structured Data]
        SPEECH[Speech]
        SIGNALS[3D Signals]
        IMAGES[Images]
    end

    %% -------------------
    %% Foundation Model
    %% -------------------
    subgraph FM_GROUP["Foundation Model"]
        FM[Pre-trained Model]
    end

    %% -------------------
    %% Adaptation Layer
    %% -------------------
    subgraph ADAPT_GROUP["Adaptation"]
        FT[Fine-tuning]
        PROMPT[Prompting]
        RAG[RAG]
    end

    %% -------------------
    %% Tasks
    %% -------------------
    subgraph TASKS_GROUP["Tasks"]
        IE[Information Extraction]
        OR[Object Recognition]
        IF[Instruction Following]
        IC[Image Captioning]
        SA[Sentiment Analysis]
        QA[Question Answering]
    end

    %% -------------------
    %% Connections
    %% -------------------
    TEXT -->|Training| FM
    STRUCT -->|Training| FM
    SPEECH -->|Training| FM
    SIGNALS -->|Training| FM
    IMAGES -->|Training| FM

    FM -->|Adaptation| FT
    FM -->|Adaptation| PROMPT
    FM -->|Adaptation| RAG

    FT --> IE
    FT --> SA

    PROMPT --> IF
    PROMPT --> QA

    RAG --> IE
    RAG --> QA
    RAG --> IC

    %% -------------------
    %% Styling
    %% -------------------
    style TEXT fill:#C8E6C9
    style STRUCT fill:#C8E6C9
    style SPEECH fill:#C8E6C9
    style SIGNALS fill:#C8E6C9
    style IMAGES fill:#C8E6C9

    style FM fill:#90CAF9

    style FT fill:#BBDEFB
    style PROMPT fill:#BBDEFB
    style RAG fill:#BBDEFB

    style IE fill:#FFCCBC
    style OR fill:#FFCCBC
    style IF fill:#FFCCBC
    style IC fill:#FFCCBC
    style SA fill:#FFCCBC
    style QA fill:#FFCCBC

    style DATA_GROUP stroke:none,fill:transparent
    style FM_GROUP stroke:none,fill:transparent
    style ADAPT_GROUP stroke:none,fill:transparent
    style TASKS_GROUP stroke:none,fill:transparent

Traditional ML vs Foundation Models #

Feature	Traditional ML Models	Foundation Models
Training data	Small, task-specific datasets	Massive, diverse datasets
Model size	Small to medium	Very large (millions/billions of parameters)
Purpose	Single, specific task	General-purpose
Reusability	Limited	High
Training approach	Train from scratch per task	Pre-train once, adapt many times
Transfer learning	Rare or minimal	Core design principle
Examples	Linear Regression, SVM, Decision Trees	GPT, BERT, CLIP

Why Foundation Models Are Different #

Traditional ML models are built for one problem at a time.
Foundation models learn general representations of language, vision, or sound.
This enables them to be reused across many applications with minimal additional training.

Areas of Application #

Natural Language Processing (NLP)
Computer Vision
Speech Recognition
Multimodal AI (text + images + audio)

Common Foundation Model Examples #

GPT (Generative Pre-trained Transformer) #

Focus: Text generation and understanding
Tasks: Chatbots, summarisation, translation, code generation
Example: ChatGPT

BERT (Bidirectional Encoder Representations from Transformers) #

Focus: Language understanding
Tasks: Search, question answering, sentiment analysis
Used heavily in search engines

CLIP (Contrastive Language–Image Pretraining) #

Focus: Text + Image understanding
Tasks: Image classification using text prompts
Enables multimodal AI systems

Foundation Models in the AI Stack #

flowchart TB
    DATA[Massive Datasets]
    FM[Foundation Model]
    TASK1[NLP Tasks]
    TASK2[Vision Tasks]
    TASK3[Speech Tasks]

    DATA --> FM
    FM --> TASK1
    FM --> TASK2
    FM --> TASK3

How Foundation Models Are Used #

Pre-trained once on large datasets
Adapted using:
- Fine-tuning
- Prompt engineering
- Task-specific heads
Serve as the backbone for modern AI systems, including Large Language Models (LLMs)

FM Summary #

Foundation models are general-purpose AI models.
They power modern systems like LLMs and multimodal AI.
They reduce cost, time, and complexity in AI development.
They represent a major shift from task-specific ML to scalable intelligence.

Home | Generative AI