Probability often changes when we learn new information.
Conditional probability and Bayes’ theorem give a structured way to update beliefs using evidence.
Conditional probability updates probabilities after observing an event.
Bayes’ theorem lets you estimate a hidden cause from observed evidence.
Naïve Bayes turns Bayes’ theorem into a practical classifier by assuming conditional independence of features given the class.
flowchart TD
A[Conditional<br/>probability] -->|foundation| B[Bayes<br/>theorem]
D[Independent<br/>events] -->|implies| C[Independence]
C -->|simplifies| A
E[Prior] -->|with likelihood| B
F[Likelihood] -->|updates| H[Posterior]
G[Evidence] -->|normalises| B
B -->|yields| H
I[Naïve<br/>Bayes] -->|uses| B
J[Naïve<br/>assumption] -->|assumes| C
K[Features] -->|given class| J
L[Class] -->|conditions| J
I -->|predicts| M[Classification]
M -->|selects| L
style A fill:#90CAF9,stroke:#1E88E5,color:#000
style B fill:#90CAF9,stroke:#1E88E5,color:#000
style C fill:#90CAF9,stroke:#1E88E5,color:#000
style D fill:#CE93D8,stroke:#8E24AA,color:#000
style E fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style G fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style K fill:#CE93D8,stroke:#8E24AA,color:#000
style L fill:#CE93D8,stroke:#8E24AA,color:#000
style H fill:#C8E6C9,stroke:#2E7D32,color:#000
style I fill:#C8E6C9,stroke:#2E7D32,color:#000
style M fill:#C8E6C9,stroke:#2E7D32,color:#000
Probability distributions are the bridge between:
real-world randomness and mathematical modelling.
A random experiment produces outcomes.
A random variable turns those outcomes into numbers.
A probability distribution tells you how likely each number (or range of numbers) is.
Key takeaway:
A distribution is a complete “story” about uncertainty:
what values are possible, how likely they are, and how we summarise them (mean, variance).
Many ML models are probabilistic:
they assume data (or errors) follow a distribution.
Loss functions often come from distribution assumptions:
squared loss aligns with Gaussian noise.
Naïve Bayes (from the previous module) becomes practical once you can model:
\( P(X\mid Y) \)
using suitable distributions.
In practice:
choosing a distribution is a modelling decision.
It affects:
prediction, uncertainty estimates, and what “rare” or “typical” means in your data.