Probability

Conditional probability updates probabilities after observing an event.
Bayes’ theorem lets you estimate a hidden cause from observed evidence.
Naïve Bayes turns Bayes’ theorem into a practical classifier by assuming conditional independence of features given the class.

flowchart TD

A[Conditional<br/>probability] -->|foundation| B[Bayes<br/>theorem]
D[Independent<br/>events] -->|implies| C[Independence]
C -->|simplifies| A

E[Prior] -->|with likelihood| B
F[Likelihood] -->|updates| H[Posterior]
G[Evidence] -->|normalises| B
B -->|yields| H

I[Naïve<br/>Bayes] -->|uses| B
J[Naïve<br/>assumption] -->|assumes| C
K[Features] -->|given class| J
L[Class] -->|conditions| J
I -->|predicts| M[Classification]
M -->|selects| L

style A fill:#90CAF9,stroke:#1E88E5,color:#000
style B fill:#90CAF9,stroke:#1E88E5,color:#000
style C fill:#90CAF9,stroke:#1E88E5,color:#000

style D fill:#CE93D8,stroke:#8E24AA,color:#000
style E fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style G fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style K fill:#CE93D8,stroke:#8E24AA,color:#000
style L fill:#CE93D8,stroke:#8E24AA,color:#000

style H fill:#C8E6C9,stroke:#2E7D32,color:#000
style I fill:#C8E6C9,stroke:#2E7D32,color:#000
style M fill:#C8E6C9,stroke:#2E7D32,color:#000

Quick summary #

Conditional probability: updates probability after an event is known.
Multiplication rule: computes joint probability from conditional parts.
Independence: tested using \( P(A\cap B)=P(A)P(B) \) .
Total probability: breaks a probability into weighted cases.
Bayes’ theorem: reverses conditioning to infer causes from evidence.

What’s next #

Probability Distributions
Move from events to random variables and distributions.

Conditional Probability

March 12, 2026

AI, Statistics

AI, Statistics, Probability

Conditional Probability #

Conditional probability updates the probability of an event when new information is available.

It shows up whenever a question says:

“given that…”
“among those who…”
“out of the items that…”
“if it does not fail immediately…”

Key takeaway: Conditional probability is always:
joint probability ÷ probability of the condition.
The condition must not be an impossible event.

Prior vs posterior #

Prior probability: probability with no condition (before new information)

Bayes’ Theorem

AI, Statistics

AI, Statistics, Probability

Bayes’ Theorem #

2.1 Total probability (needed for Bayes) #

Often we split the world into cases \( E_1,E_2,\dots,E_k \) that:

are mutually exclusive
cover the whole sample space

Then for any event \( A \) :

\[ P(A)=\sum_{i=1}^{k} P(A\mid E_i)\,P(E_i) \]

Tree intuition:

Naïve Bayes

March 12, 2026

AI, Statistics

AI, Statistics, Probability, Machine Learning

Naïve Bayes #

Naïve Bayes is a probabilistic classifier.

Supervised Learning Problem
Binary Classification - final target variable is considered in two classes
Hypothesis is target which you want to classify
Total Probability (Prior) of Yes and No is already calculated
Post / Posterior is when you start studying data
Based on max probability of hypotheses classify given instance into a class

It predicts a class label by computing:

Probability Distributions

February 22, 2026

AI, Statistics

AI, Statistics, Probability

Probability Distributions #

Probability distributions are the bridge between: real-world randomness and mathematical modelling.

A random experiment produces outcomes. A random variable turns those outcomes into numbers. A probability distribution tells you how likely each number (or range of numbers) is.

Key takeaway: A distribution is a complete “story” about uncertainty: what values are possible, how likely they are, and how we summarise them (mean, variance).

flowchart TD
	PD["Probability<br/>distributions"] --> RV["Random<br/>variables"]
	PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]

	style PD fill:#90CAF9,stroke:#1E88E5,color:#000
	style RV fill:#90CAF9,stroke:#1E88E5,color:#000
	style DS fill:#90CAF9,stroke:#1E88E5,color:#000

AI/ML Connection #

Many ML models are probabilistic: they assume data (or errors) follow a distribution.
Loss functions often come from distribution assumptions: squared loss aligns with Gaussian noise.
Naïve Bayes (from the previous module) becomes practical once you can model: \( P(X\mid Y) \) using suitable distributions.

In practice: choosing a distribution is a modelling decision. It affects: prediction, uncertainty estimates, and what “rare” or “typical” means in your data.

Random Variables

February 22, 2026

AI, Statistics

AI, Statistics, Probability

Random Variables #

A random variable is a way to attach numbers to outcomes of a random experiment.

It lets us move from: “what happened?” to: “what number should we analyse?”

Key takeaway: A random variable is a function from the sample space to real numbers. Once you define the random variable clearly, the rest (pmf/pdf/cdf, mean, variance) becomes systematic.

flowchart TD
PD["Probability<br/>distributions"] --> RV["Random<br/>variables"]

RV --> T["Types"]
T --> RV1["Discrete<br/>RVs"]
T --> RV2["Continuous<br/>RVs"]

RV --> F["PMF / PDF / CDF"]
RV --> S["Mean / Variance<br/>Covariance"]
RV --> J["Joint & Marginal<br/>distributions"]
RV --> X["Transformations"]

style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style RV fill:#90CAF9,stroke:#1E88E5,color:#000

style T fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style S fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style X fill:#CE93D8,stroke:#8E24AA,color:#000
style RV1 fill:#CE93D8,stroke:#8E24AA,color:#000
style RV2 fill:#CE93D8,stroke:#8E24AA,color:#000

1) Definition #

Random variable: a rule that assigns a number to each outcome.

Common Probability Distributions

February 22, 2026

AI, Statistics

AI, Statistics, Probability

Common Probability Distributions #

Once you can describe a random variable using a pmf or pdf, the next step is to use named distributions that appear repeatedly in real data and in ML models.

Key takeaway: Named distributions give you ready-made probability models for common patterns: binary outcomes, counts, and measurement noise.

flowchart TD
PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]

DS --> DIS["Discrete"]
DS --> CON["Continuous"]

DIS --> D1["Bernoulli"]
DIS --> D2["Binomial"]
DIS --> D3["Poisson"]

CON --> D4["Normal<br/>(Gaussian)"]
CON --> D5["t / Chi-square / F<br/>(intro)"]

style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style DS fill:#90CAF9,stroke:#1E88E5,color:#000

style DIS fill:#CE93D8,stroke:#8E24AA,color:#000
style CON fill:#CE93D8,stroke:#8E24AA,color:#000

style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D3 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D4 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D5 fill:#C8E6C9,stroke:#2E7D32,color:#000

1) Bernoulli distribution (binary) #

Use when: one trial has two outcomes (success/failure).

Mathematical Foundation

March 18, 2026

AI, ML

Machine Learning, Mathematics, Linear Algebra, Probability, Statistics, Calculus

Mathematical Foundations for Machine Learning #

Machine Learning is built on mathematical principles that allow models to:

represent data
learn patterns
optimise performance

flowchart LR
    DATA[Data]
    MATH[Math Models]
    OPT[Optimisation]
    MODEL[Trained Model]

    DATA --> MATH
    MATH --> OPT
    OPT --> MODEL

ML requires core mathematical tools to understand how ML algorithms work internally. Algebra deals with relationships between variables and quantities, while Calculus focuses on change and optimization.