February 21, 2026Gradient Descent for Linear Regression
#
Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.
- Iterative method
- Types: batch / stochastic / mini-batch
Key takeaway:
Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.
flowchart TD
GD["Gradient<br/>Descent"] -->|minimises| CF["Cost<br/>function"]
GD -->|updates| W["Parameters<br/>(weights)"]
GD -->|uses| GR["Gradient<br/>(slope)"]
GD --> H["Hyperparameters"]
H --> LR["Learning<br/>rate"]
H --> BS["Batch<br/>size"]
H --> EP["Epochs"]
style GD fill:#90CAF9,stroke:#1E88E5,color:#000
style CF fill:#CE93D8,stroke:#8E24AA,color:#000
style W fill:#CE93D8,stroke:#8E24AA,color:#000
style GR fill:#CE93D8,stroke:#8E24AA,color:#000
style H fill:#CE93D8,stroke:#8E24AA,color:#000
style LR fill:#CE93D8,stroke:#8E24AA,color:#000
style BS fill:#CE93D8,stroke:#8E24AA,color:#000
style EP fill:#CE93D8,stroke:#8E24AA,color:#000
Types of GD
#
flowchart TD
T["Gradient Descent<br/>types"] --> BGD["Batch<br/>GD"]
T --> SGD["Stochastic<br/>GD"]
T --> MGD["Mini-batch<br/>GD"]
BGD --> ALL["All data<br/>per step"]
BGD --> STB["Smooth<br/>updates"]
SGD --> ONE["1 sample<br/>per step"]
SGD --> FAST["Quick<br/>progress"]
SGD --> NOISE["Noisy<br/>updates"]
MGD --> MB["Small batch<br/>per step"]
MGD --> PRACT["Practical<br/>default"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style BGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style SGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style MGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style ALL fill:#CE93D8,stroke:#8E24AA,color:#000
style STB fill:#CE93D8,stroke:#8E24AA,color:#000
style ONE fill:#CE93D8,stroke:#8E24AA,color:#000
style FAST fill:#CE93D8,stroke:#8E24AA,color:#000
style NOISE fill:#CE93D8,stroke:#8E24AA,color:#000
style MB fill:#CE93D8,stroke:#8E24AA,color:#000
style PRACT fill:#CE93D8,stroke:#8E24AA,color:#000
Batch
#
- Use only if you have huge compute and a lot of time to train
SGD
#
Linear models for Classification
#
- categorises data by finding a linear boundary (hyperplane) that separates classes
- calculating a weighted sum of input features plus bias
flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]
P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000
style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000
Discriminant Functions
#
Decision Theory
#
Probabilistic Discriminative Classifiers
#
Logistic Regression
#
- Supervised machine learning algorithm
- Binary classification algorithm
- requires data to be linearly separable
- predicts the probability that an input belongs to a specific class
- uses Sigmoid function to convert inputs into a probability value between 0 and 1
Key takeaway:
Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$,
then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).
December 14, 2025Foundation Model
#
AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.
are large deep learning neural networks
are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).
Contain millions or billions of parameters.
designed to perform a broad range of general tasks
designed for general-purpose intelligence, not a single task.
acts as base models for building specialised AI applications
LLM – Large Language Model
#
Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.
They learn language by analysing massive amounts of text data, discovering patterns in:
grammar
meaning
context
relationships between words and sentences
Built on Deep Learning
Implemented using Neural Networks
Based on Transformers
Often combined with tools like:
- Retrieval (RAG)
- Agents
- External APIs
- Memory systems
What makes an LLM special?
#
- Built using deep neural networks
- Trained on very large datasets (books, articles, code, web text)
- Can perform many tasks without task-specific training
- General-purpose language understanding, not single-task models
LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.
Decision Tree
#
A decision tree classifies an example by asking a sequence of questions about its attributes until it reaches a leaf (final decision).
Key takeaway:
A decision tree grows by repeatedly splitting the training data into purer subsets using an impurity measure
(Entropy / Gini / Classification Error).
Decision trees need a way to measure:
“How mixed are the class labels at a node?”
Instance-based Learning
#
Instance-based learning is a family of methods that do not build one explicit global model during training. Instead, they store training examples and delay most of the work until a new query arrives.
When a new point must be classified or predicted, the algorithm compares it with previously seen examples, finds the most relevant neighbours, and uses them to produce the answer.
Instance-based Learning covers three linked ideas:
Support Vector Machine (SVM)
#
A Support Vector Machine (SVM) is a supervised machine learning algorithm used for:
- Classification (most common)
- Regression (SVR – Support Vector Regression)
Find the decision boundary that separates classes with the maximum margin.
A Support Vector Machine is a supervised learning algorithm that finds an optimal hyperplane by maximising the margin between classes, using support vectors and kernel functions to handle non-linear data.