Probability distributions are the bridge between:
real-world randomness and mathematical modelling.
A random experiment produces outcomes.
A random variable turns those outcomes into numbers.
A probability distribution tells you how likely each number (or range of numbers) is.
Key takeaway:
A distribution is a complete “story” about uncertainty:
what values are possible, how likely they are, and how we summarise them (mean, variance).
Many ML models are probabilistic:
they assume data (or errors) follow a distribution.
Loss functions often come from distribution assumptions:
squared loss aligns with Gaussian noise.
Naïve Bayes (from the previous module) becomes practical once you can model:
\( P(X\mid Y) \)
using suitable distributions.
In practice:
choosing a distribution is a modelling decision.
It affects:
prediction, uncertainty estimates, and what “rare” or “typical” means in your data.
A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).
Single neuron for regression (predicting how much / how many)
Data + linear model (single neuron, no hidden layers) + squared loss
Training using batch gradient descent algorithm
Prediction (inference)
Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)
flowchart LR
D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
M --> A["Activation<br/>Identity"]
A --> L["Loss<br/>MSE (Squared error)"]
L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
O --> P["Parameters<br/>w, b"]
P --> I["Inference<br/>Predict ŷ (number) for new x"]
%% Pastel colour scheme
style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
Generative Artificial Intelligence (GenAI) refers to a class of AI systems that can generate new content such as text, images, audio, video, or code, rather than only making predictions or classifications.
GenAI systems learn patterns and representations from large datasets and use them to produce novel outputs that resemble the data they were trained on.
The AI pipeline is a continuous process where data is collected, prepared, used to train models, evaluated for performance, and continuously improved after deployment.
timeline
title AI Pipeline
Collect Data : Data Ingestion
: Data Understanding
Prepare Data : Cleaning
: Feature Engineering
: Sampling
Train Model : Model Training
: Validation & Metrics
Deploy Model : Deployment
: Monitoring & Retraining
Linear Regression is a supervised
ML
method used to predict a numerical target by fitting a model that is linear in its parameters.
In
ML
, linear models are a core baseline:
they’re fast, often surprisingly strong, and usually easy to interpret.
Key takeaway:
Linear Regression learns parameters by minimising a squared-error cost.
You can solve it directly (closed form) or iteratively (gradient descent),
and you can extend it using basis functions and regularisation.
A random variable is a way to attach numbers to outcomes of a random experiment.
It lets us move from:
“what happened?”
to:
“what number should we analyse?”
Key takeaway:
A random variable is a function from the sample space to real numbers.
Once you define the random variable clearly, the rest (pmf/pdf/cdf, mean, variance) becomes systematic.
flowchart TD
PD["Probability<br/>distributions"] --> RV["Random<br/>variables"]
RV --> T["Types"]
T --> RV1["Discrete<br/>RVs"]
T --> RV2["Continuous<br/>RVs"]
RV --> F["PMF / PDF / CDF"]
RV --> S["Mean / Variance<br/>Covariance"]
RV --> J["Joint & Marginal<br/>distributions"]
RV --> X["Transformations"]
style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style RV fill:#90CAF9,stroke:#1E88E5,color:#000
style T fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style S fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style X fill:#CE93D8,stroke:#8E24AA,color:#000
style RV1 fill:#CE93D8,stroke:#8E24AA,color:#000
style RV2 fill:#CE93D8,stroke:#8E24AA,color:#000
Once you can describe a random variable using a pmf or pdf, the next step is to use
named distributions that appear repeatedly in real data and in ML models.
Key takeaway:
Named distributions give you ready-made probability models for common patterns:
binary outcomes, counts, and measurement noise.
Direct solution method - Ordinary Least Squares and the Line of Best Fit
#
It is possible to compute the best parameters for linear regression in one shot (closed-form),
instead of iteratively improving them step-by-step. fileciteturn34file10turn34file6
For linear regression, the direct method is usually Ordinary Least Squares (OLS).
Ordinary Least Squares (OLS) chooses the “best” line by minimising squared prediction errors.
Key takeaway:
OLS defines “best fit” as the line that minimises the total squared residual error across all data points.