February 21, 2026Direct solution method - Ordinary Least Squares and the Line of Best Fit
#
Revision:
OLS is the direct method for linear regression. It finds the best-fit line by minimising the sum of squared residuals without iterative updates.
Direct Method vs Iterative Method ☆
#
Linear regression parameters can be found in two main ways.
| Method | Main idea | When used |
|---|
| Ordinary Least Squares | Compute the best parameters directly | Small or moderate datasets |
| Gradient Descent | Start with parameters and update repeatedly | Large datasets or many features |
flowchart LR
A["Linear Regression"] --> B["Direct Solution<br/>OLS"]
A --> C["Iterative Solution<br/>Gradient Descent"]
B --> B1["Normal Equation"]
B --> B2["No learning rate"]
B --> B3["One-shot solution"]
C --> C1["Learning rate"]
C --> C2["Repeated updates"]
C --> C3["Stops after convergence"]
style A fill:#E1F5FE,stroke:#5b7db1,color:#000
style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
style C fill:#FFF9C4,stroke:#b59b3b,color:#000
style B1 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style B2 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style B3 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style C1 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style C2 fill:#EDE7F6,stroke:#8a6fb3,color:#000
style C3 fill:#EDE7F6,stroke:#8a6fb3,color:#000
Why It Is Called “Least Squares” ☆
#
OLS is called least squares because it chooses parameters that make the squared residual errors as small as possible.
February 21, 2026Cost Function
#
Revision:
A cost function converts model error into a single number. Training means changing the model parameters until this number becomes as small as possible.
Why Cost Function Matters in ML ☆
#
A machine learning model needs a way to decide whether one set of parameters is better than another.
For linear regression, every possible value of the parameters gives a different line.
The cost function tells us which line is better by measuring how far the predictions are from the true values.
February 21, 2026Gradient Descent for Linear Regression
#
Revision:
Gradient descent is the step-by-step method for reducing the cost function when a direct closed-form solution is not convenient.
Where Gradient Descent Fits in ML ☆
#
Gradient descent is used when we want the model to learn parameters by repeatedly improving them.
For linear regression, it adjusts the slope and intercept until the prediction error becomes small.
flowchart LR
A["Initial Parameters"] --> B["Make Predictions"]
B --> C["Compute Cost"]
C --> D["Compute Gradient"]
D --> E["Update Parameters"]
E --> B
style A fill:#E1F5FE,stroke:#5b7db1,color:#000
style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
style C fill:#FFF9C4,stroke:#b59b3b,color:#000
style D fill:#EDE7F6,stroke:#8a6fb3,color:#000
style E fill:#C8E6C9,stroke:#5f8f6a,color:#000
Core Idea ☆
#
The gradient tells us the direction in which the cost increases fastest.
Linear models for Classification
#
- categorises data by finding a linear boundary (hyperplane) that separates classes
- calculating a weighted sum of input features plus bias
flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]
P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000
style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000
- Discriminant Functions
- Decision Theory
- Probabilistic Discriminative Classifiers
- Logistic Regression
Logistic Regression
#
- Supervised machine learning algorithm
- Binary classification algorithm
- requires data to be linearly separable
- predicts the probability that an input belongs to a specific class
- uses Sigmoid function to convert inputs into a probability value between 0 and 1
Key takeaway:
Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$,
then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).
Hypothesis Testing
#
Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.
It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.
Key takeaway:
Hypothesis testing is not about proving something with certainty.
It is about asking:
If the null hypothesis were true, how surprising would this sample result be?
December 14, 2025Foundation Model
#
AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.
are large deep learning neural networks
are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).
Contain millions or billions of parameters.
designed to perform a broad range of general tasks
designed for general-purpose intelligence, not a single task.
acts as base models for building specialised AI applications
LLM – Large Language Model
#
Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.
They learn language by analysing massive amounts of text data, discovering patterns in:
grammar
meaning
context
relationships between words and sentences
Built on Deep Learning
Implemented using Neural Networks
Based on Transformers
Often combined with tools like:
- Retrieval (RAG)
- Agents
- External APIs
- Memory systems
What makes an LLM special?
#
- Built using deep neural networks
- Trained on very large datasets (books, articles, code, web text)
- Can perform many tasks without task-specific training
- General-purpose language understanding, not single-task models
LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.
December 15, 2025AI Agents
#
Also referred to as Agentic AI.
AI agents are intelligent systems that can plan, make decisions, and take actions to achieve goals with minimal human intervention.
A common use case is task automation
for example booking travel based on a user’s request.
AI agents typically build on Generative AI and use Large Language Models (LLMs) as the reasoning core.
Agents often interact with tools (APIs, databases, calendars) to complete multi-step workflows.
Retrieval-Augmented Generation (RAG)
#
Retrieval-Augmented Generation (RAG) is a system design pattern that improves an LLM’s answers by:
- Retrieving relevant information from an external knowledge source, and then
- Augmenting the LLM prompt with that retrieved context before generating the final response.
RAG helps an LLM look things up first, then answer using evidence.
Why RAG is Useful
#
RAG is commonly used when:
- Your knowledge is in private documents (PDFs, policies, internal wiki)
- You need up-to-date information (things not in the model’s training data)
- You want fewer hallucinations by grounding answers in retrieved sources
- You want traceability (show “where the answer came from”)
RAG does not change the model weights.
It changes what the model sees at inference time by adding retrieved context.
May 28, 2026Mathematical Foundations for Machine Learning
#
Machine Learning is built on mathematical principles that allow models to:
- represent data
- learn patterns
- optimise performance
flowchart LR
DATA[Data]
MATH[Math Models]
OPT[Optimisation]
MODEL[Trained Model]
DATA --> MATH
MATH --> OPT
OPT --> MODEL
ML requires core mathematical tools to understand how ML algorithms work internally. Algebra deals with relationships between variables and quantities, while Calculus focuses on change and optimization.