AI

Ordinary Least Squares

Direct solution method - Ordinary Least Squares and the Line of Best Fit #

Revision:
OLS is the direct method for linear regression. It finds the best-fit line by minimising the sum of squared residuals without iterative updates.


Direct Method vs Iterative Method ☆ #

Linear regression parameters can be found in two main ways.

MethodMain ideaWhen used
Ordinary Least SquaresCompute the best parameters directlySmall or moderate datasets
Gradient DescentStart with parameters and update repeatedlyLarge datasets or many features
flowchart LR
    A["Linear Regression"] --> B["Direct Solution<br/>OLS"]
    A --> C["Iterative Solution<br/>Gradient Descent"]

    B --> B1["Normal Equation"]
    B --> B2["No learning rate"]
    B --> B3["One-shot solution"]

    C --> C1["Learning rate"]
    C --> C2["Repeated updates"]
    C --> C3["Stops after convergence"]

    style A fill:#E1F5FE,stroke:#5b7db1,color:#000
    style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
    style C fill:#FFF9C4,stroke:#b59b3b,color:#000
    style B1 fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style B2 fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style B3 fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style C1 fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style C2 fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style C3 fill:#EDE7F6,stroke:#8a6fb3,color:#000

Why It Is Called “Least Squares” ☆ #

OLS is called least squares because it chooses parameters that make the squared residual errors as small as possible.

Cost Function

Cost Function #

Revision:
A cost function converts model error into a single number. Training means changing the model parameters until this number becomes as small as possible.


Why Cost Function Matters in ML ☆ #

A machine learning model needs a way to decide whether one set of parameters is better than another.

For linear regression, every possible value of the parameters gives a different line. The cost function tells us which line is better by measuring how far the predictions are from the true values.

Gradient Descent Algorithm

Gradient Descent Algorithm #

Gradient Descent Algorithm (GDA) is

  • an optimisation method
  • used to train models
  • by repeatedly updating parameters (weights and biases) to reduce the loss

In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.

Gradient Descent is used in both regression and classification.

It’s not tied to the task type — it’s tied to the fact you have:

Gradient Descent

Gradient Descent for Linear Regression #

Revision:
Gradient descent is the step-by-step method for reducing the cost function when a direct closed-form solution is not convenient.


Where Gradient Descent Fits in ML ☆ #

Gradient descent is used when we want the model to learn parameters by repeatedly improving them.

For linear regression, it adjusts the slope and intercept until the prediction error becomes small.

flowchart LR
    A["Initial Parameters"] --> B["Make Predictions"]
    B --> C["Compute Cost"]
    C --> D["Compute Gradient"]
    D --> E["Update Parameters"]
    E --> B

    style A fill:#E1F5FE,stroke:#5b7db1,color:#000
    style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
    style C fill:#FFF9C4,stroke:#b59b3b,color:#000
    style D fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style E fill:#C8E6C9,stroke:#5f8f6a,color:#000

Core Idea ☆ #

The gradient tells us the direction in which the cost increases fastest.

LNN for Classification

Linear NN for Classification #

A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.

Neural-network view:

  • Binary classification → logistic regression (single neuron + sigmoid)
  • Multi-class classification → softmax regression (K output neurons + softmax)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
  M --> A["Activation<br/>Sigmoid / Softmax"]
  A --> L["Loss<br/>Cross-entropy"]
  L --> O["Optimiser<br/>Mini-batch GD / Adam"]
  O --> P["Updated parameters<br/>w, b"]
  P --> I["Inference<br/>Probabilities → class"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Classification #

Classification predicts a discrete class label.
Common settings:

Classification (Linear)

Linear models for Classification #

  • categorises data by finding a linear boundary (hyperplane) that separates classes
  • calculating a weighted sum of input features plus bias
flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]

P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000

style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000
  • Discriminant Functions
  • Decision Theory
  • Probabilistic Discriminative Classifiers
  • Logistic Regression

Logistic Regression #

  • Supervised machine learning algorithm
  • Binary classification algorithm
  • requires data to be linearly separable
  • predicts the probability that an input belongs to a specific class
  • uses Sigmoid function to convert inputs into a probability value between 0 and 1

Key takeaway: Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$, then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).

Hypothesis Testing

Hypothesis Testing #

Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.

It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.

Key takeaway:
Hypothesis testing is not about proving something with certainty.

It is about asking:

If the null hypothesis were true, how surprising would this sample result be?

Foundation Models

Foundation Model #

AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.

  • are large deep learning neural networks

  • are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).

  • Contain millions or billions of parameters.

  • designed to perform a broad range of general tasks

  • designed for general-purpose intelligence, not a single task.

  • acts as base models for building specialised AI applications

LLM - Model

LLM – Large Language Model #

Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.

They learn language by analysing massive amounts of text data, discovering patterns in:

  • grammar

  • meaning

  • context

  • relationships between words and sentences

  • Built on Deep Learning

  • Implemented using Neural Networks

  • Based on Transformers

  • Often combined with tools like:

    • Retrieval (RAG)
    • Agents
    • External APIs
    • Memory systems

What makes an LLM special? #

  • Built using deep neural networks
  • Trained on very large datasets (books, articles, code, web text)
  • Can perform many tasks without task-specific training
  • General-purpose language understanding, not single-task models

Foundation: Transformer Architecture #

LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.

AI Agents

AI Agents #

Also referred to as Agentic AI.

AI agents are intelligent systems that can plan, make decisions, and take actions to achieve goals with minimal human intervention.

  • A common use case is task automation

  • for example booking travel based on a user’s request.

  • AI agents typically build on Generative AI and use Large Language Models (LLMs) as the reasoning core.

  • Agents often interact with tools (APIs, databases, calendars) to complete multi-step workflows.