February 21, 2026Direct solution method - Ordinary Least Squares and the Line of Best Fit
#
Revision:
OLS is the direct method for linear regression. It finds the best-fit line by minimising the sum of squared residuals without iterative updates.
Direct Method vs Iterative Method ☆
#
Linear regression parameters can be found in two main ways.
| Method | Main idea | When used |
|---|
| Ordinary Least Squares | Compute the best parameters directly | Small or moderate datasets |
| Gradient Descent | Start with parameters and update repeatedly | Large datasets or many features |
flowchart LR
A["Linear Regression"] --> B["Direct Solution<br/>OLS"]
A --> C["Iterative Solution<br/>Gradient Descent"]
B --> B1["Normal Equation"]
B --> B2["No learning rate"]
B --> B3["One-shot solution"]
C --> C1["Learning rate"]
C --> C2["Repeated updates"]
C --> C3["Stops after convergence"]
style A fill:#E1F5FE,stroke:#5b7db1,color:#000
style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
style C fill:#FFF9C4,stroke:#b59b3b,color:#000
style B1 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style B2 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style B3 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style C1 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style C2 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style C3 fill:#EDE7F6,stroke:#8a6fb3,color:#000
Why It Is Called “Least Squares” ☆
#
OLS is called least squares because it chooses parameters that make the squared residual errors as small as possible.
February 21, 2026Cost Function
#
Revision:
A cost function converts model error into a single number. Training means changing the model parameters until this number becomes as small as possible.
Why Cost Function Matters in ML ☆
#
A machine learning model needs a way to decide whether one set of parameters is better than another.
For linear regression, every possible value of the parameters gives a different line.
The cost function tells us which line is better by measuring how far the predictions are from the true values.
February 26, 2026Gradient Descent Algorithm
#
Gradient Descent Algorithm (GDA) is
- an optimisation method
- used to train models
- by repeatedly updating parameters (weights and biases) to reduce the loss
In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.
Gradient Descent is used in both regression and classification.
It’s not tied to the task type — it’s tied to the fact you have:
February 21, 2026Gradient Descent for Linear Regression
#
Revision:
Gradient descent is the step-by-step method for reducing the cost function when a direct closed-form solution is not convenient.
Where Gradient Descent Fits in ML ☆
#
Gradient descent is used when we want the model to learn parameters by repeatedly improving them.
For linear regression, it adjusts the slope and intercept until the prediction error becomes small.
flowchart LR
A["Initial Parameters"] --> B["Make Predictions"]
B --> C["Compute Cost"]
C --> D["Compute Gradient"]
D --> E["Update Parameters"]
E --> B
style A fill:#E1F5FE,stroke:#5b7db1,color:#000
style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
style C fill:#FFF9C4,stroke:#b59b3b,color:#000
style D fill:#EDE7F6,stroke:#8a6fb3,color:#000
style E fill:#C8E6C9,stroke:#5f8f6a,color:#000
Core Idea ☆
#
The gradient tells us the direction in which the cost increases fastest.
February 15, 2026Linear NN for Classification
#
A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.
Neural-network view:
- Binary classification → logistic regression (single neuron + sigmoid)
- Multi-class classification → softmax regression (K output neurons + softmax)
flowchart LR
D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
M --> A["Activation<br/>Sigmoid / Softmax"]
A --> L["Loss<br/>Cross-entropy"]
L --> O["Optimiser<br/>Mini-batch GD / Adam"]
O --> P["Updated parameters<br/>w, b"]
P --> I["Inference<br/>Probabilities → class"]
%% Pastel colour scheme
style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
Classification
#
Classification predicts a discrete class label.
Common settings:
Linear models for Classification
#
- categorises data by finding a linear boundary (hyperplane) that separates classes
- calculating a weighted sum of input features plus bias
flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]
P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000
style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000
- Discriminant Functions
- Decision Theory
- Probabilistic Discriminative Classifiers
- Logistic Regression
Logistic Regression
#
- Supervised machine learning algorithm
- Binary classification algorithm
- requires data to be linearly separable
- predicts the probability that an input belongs to a specific class
- uses Sigmoid function to convert inputs into a probability value between 0 and 1
Key takeaway:
Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$,
then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).
Hypothesis Testing
#
Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.
It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.
Key takeaway:
Hypothesis testing is not about proving something with certainty.
It is about asking:
If the null hypothesis were true, how surprising would this sample result be?
December 14, 2025Foundation Model
#
AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.
are large deep learning neural networks
are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).
Contain millions or billions of parameters.
designed to perform a broad range of general tasks
designed for general-purpose intelligence, not a single task.
acts as base models for building specialised AI applications
LLM – Large Language Model
#
Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.
They learn language by analysing massive amounts of text data, discovering patterns in:
grammar
meaning
context
relationships between words and sentences
Built on Deep Learning
Implemented using Neural Networks
Based on Transformers
Often combined with tools like:
- Retrieval (RAG)
- Agents
- External APIs
- Memory systems
What makes an LLM special?
#
- Built using deep neural networks
- Trained on very large datasets (books, articles, code, web text)
- Can perform many tasks without task-specific training
- General-purpose language understanding, not single-task models
LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.
December 15, 2025AI Agents
#
Also referred to as Agentic AI.
AI agents are intelligent systems that can plan, make decisions, and take actions to achieve goals with minimal human intervention.
A common use case is task automation
for example booking travel based on a user’s request.
AI agents typically build on Generative AI and use Large Language Models (LLMs) as the reasoning core.
Agents often interact with tools (APIs, databases, calendars) to complete multi-step workflows.