Probability on Arshad Siddiqui

Formula Sheet

Thu, 12 Mar 2026 00:00:00 +0000

Formula Sheet #

This page is a quick reference of definitions + formulas, grouped by the modules.

Notation #

Sample size: \( n \) (sample), \( N \) (population)
Sample mean: \( \bar{x} \) , population mean: \( \mu \)
Sample variance: \( s^2 \) , population variance: \( \sigma^2 \)
Sample SD: \( s \) , population SD: \( \sigma \)
Complement: \( A^c \)
Intersection (“and”): \( A\cap B \) , union (“or”): \( A\cup B \)
Conditional probability: \( P(A\mid B) \)

1. Basic Probability & Statistics #

1.1 Measures of Central Tendency #

Arithmetic mean #

Sample mean (ungrouped):

Stats Formula Sheet

Wed, 25 Feb 2026 00:00:00 +0000

Stats Formula Sheet #

Keep this page as a quick reference of definitions + formulas.

Notation #

Sample size: \( n \) (sample), \( N \) (population)
Mean: \( \bar{x} \) (sample), \( \mu \) (population)
Variance: \( s^2 \) (sample), \( \sigma^2 \) (population)
Standard deviation: \( s \) (sample), \( \sigma \) (population)

Module 1: Basic Statistics #

Measures of Central Tendency #

Sample mean (ungrouped):

Conditional Probability & Bayes’ Theorem

Thu, 12 Mar 2026 00:00:00 +0000

Conditional Probability & Bayes’ Theorem #

Probability often changes when we learn new information.

Conditional probability and Bayes’ theorem give a structured way to update beliefs using evidence.

Conditional probability updates probabilities after observing an event.

Bayes’ theorem lets you estimate a hidden cause from observed evidence.

Naïve Bayes turns Bayes’ theorem into a practical classifier by assuming conditional independence of features given the class.

flowchart TD

A[Conditional<br/>probability] -->|foundation| B[Bayes<br/>theorem]
D[Independent<br/>events] -->|implies| C[Independence]
C -->|simplifies| A

E[Prior] -->|with likelihood| B
F[Likelihood] -->|updates| H[Posterior]
G[Evidence] -->|normalises| B
B -->|yields| H

I[Naïve<br/>Bayes] -->|uses| B
J[Naïve<br/>assumption] -->|assumes| C
K[Features] -->|given class| J
L[Class] -->|conditions| J
I -->|predicts| M[Classification]
M -->|selects| L

style A fill:#90CAF9,stroke:#1E88E5,color:#000
style B fill:#90CAF9,stroke:#1E88E5,color:#000
style C fill:#90CAF9,stroke:#1E88E5,color:#000

style D fill:#CE93D8,stroke:#8E24AA,color:#000
style E fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style G fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style K fill:#CE93D8,stroke:#8E24AA,color:#000
style L fill:#CE93D8,stroke:#8E24AA,color:#000

style H fill:#C8E6C9,stroke:#2E7D32,color:#000
style I fill:#C8E6C9,stroke:#2E7D32,color:#000
style M fill:#C8E6C9,stroke:#2E7D32,color:#000

Quick summary #

Conditional probability: updates probability after an event is known.
Multiplication rule: computes joint probability from conditional parts.
Independence: tested using \( P(A\cap B)=P(A)P(B) \) .
Total probability: breaks a probability into weighted cases.
Bayes’ theorem: reverses conditioning to infer causes from evidence.

What’s next #

Probability Distributions
Move from events to random variables and distributions.

Conditional Probability

Thu, 12 Mar 2026 00:00:00 +0000

Conditional Probability #

Conditional probability updates the probability of an event when new information is available.

It shows up whenever a question says:

“given that…”
“among those who…”
“out of the items that…”
“if it does not fail immediately…”

Key takeaway: Conditional probability is always:

joint probability ÷ probability of the condition.

The condition must not be an impossible event.

Prior vs posterior #

Prior probability: probability with no condition (before new information)

Bayes’ Theorem

Mon, 01 Jan 0001 00:00:00 +0000

Bayes’ Theorem #

2.1 Total probability (needed for Bayes) #

Often we split the world into cases \( E_1,E_2,\dots,E_k \) that:

are mutually exclusive
cover the whole sample space

Then for any event \( A \) :

\[ P(A)=\sum_{i=1}^{k} P(A\mid E_i)\,P(E_i) \]

Tree intuition:

flowchart TD
 S[Start] --> E1[Case E1]
 S --> E2[Case E2]
 S --> E3[Case E3]
 E1 --> A1["A happens"]
 E2 --> A2["A happens"]
 E3 --> A3["A happens"]

2.2 Bayes’ theorem (two-event form) #

Bayes’ Theorem is a mathematical formula used to determine the conditional probability of an event based on prior knowledge and new evidence.

Naïve Bayes

Thu, 12 Mar 2026 00:00:00 +0000

Naïve Bayes #

Naïve Bayes is a probabilistic classifier.

Supervised Learning Problem
Binary Classification - final target variable is considered in two classes
Hypothesis is target which you want to classify
Total Probability (Prior) of Yes and No is already calculated
Post / Posterior is when you start studying data
Based on max probability of hypotheses classify given instance into a class

It predicts a class label by computing:

Probability Distributions

Sun, 22 Feb 2026 00:00:00 +0000

Probability Distributions #

Probability distributions are the bridge between: real-world randomness and mathematical modelling.

A random experiment produces outcomes. A random variable turns those outcomes into numbers. A probability distribution tells you how likely each number (or range of numbers) is.

Key takeaway: A distribution is a complete “story” about uncertainty: what values are possible, how likely they are, and how we summarise them (mean, variance).

flowchart TD
	PD["Probability<br/>distributions"] --> RV["Random<br/>variables"]
	PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]

	style PD fill:#90CAF9,stroke:#1E88E5,color:#000
	style RV fill:#90CAF9,stroke:#1E88E5,color:#000
	style DS fill:#90CAF9,stroke:#1E88E5,color:#000

AI/ML Connection #

Many ML models are probabilistic: they assume data (or errors) follow a distribution.
Loss functions often come from distribution assumptions: squared loss aligns with Gaussian noise.
Naïve Bayes (from the previous module) becomes practical once you can model: \( P(X\mid Y) \) using suitable distributions.

In practice: choosing a distribution is a modelling decision. It affects: prediction, uncertainty estimates, and what “rare” or “typical” means in your data.

Random Variables

Sun, 22 Feb 2026 00:00:00 +0000

Random Variables #

A random variable is a way to attach numbers to outcomes of a random experiment.

It lets us move from: “what happened?” to: “what number should we analyse?”

Key takeaway: A random variable is a function from the sample space to real numbers. Once you define the random variable clearly, the rest (pmf/pdf/cdf, mean, variance) becomes systematic.

flowchart TD
PD["Probability<br/>distributions"] --> RV["Random<br/>variables"]

RV --> T["Types"]
T --> RV1["Discrete<br/>RVs"]
T --> RV2["Continuous<br/>RVs"]

RV --> F["PMF / PDF / CDF"]
RV --> S["Mean / Variance<br/>Covariance"]
RV --> J["Joint & Marginal<br/>distributions"]
RV --> X["Transformations"]

style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style RV fill:#90CAF9,stroke:#1E88E5,color:#000

style T fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style S fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style X fill:#CE93D8,stroke:#8E24AA,color:#000
style RV1 fill:#CE93D8,stroke:#8E24AA,color:#000
style RV2 fill:#CE93D8,stroke:#8E24AA,color:#000

1) Definition #

Random variable: a rule that assigns a number to each outcome.

Common Probability Distributions

Sun, 22 Feb 2026 00:00:00 +0000

Common Probability Distributions #

Once you can describe a random variable using a pmf or pdf, the next step is to use named distributions that appear repeatedly in real data and in ML models.

Key takeaway: Named distributions give you ready-made probability models for common patterns: binary outcomes, counts, and measurement noise.

flowchart TD
PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]

DS --> DIS["Discrete"]
DS --> CON["Continuous"]

DIS --> D1["Bernoulli"]
DIS --> D2["Binomial"]
DIS --> D3["Poisson"]

CON --> D4["Normal<br/>(Gaussian)"]
CON --> D5["t / Chi-square / F<br/>(intro)"]

style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style DS fill:#90CAF9,stroke:#1E88E5,color:#000

style DIS fill:#CE93D8,stroke:#8E24AA,color:#000
style CON fill:#CE93D8,stroke:#8E24AA,color:#000

style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D3 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D4 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D5 fill:#C8E6C9,stroke:#2E7D32,color:#000

1) Bernoulli distribution (binary) #

Use when: one trial has two outcomes (success/failure).

Mathematical Foundation

Wed, 18 Mar 2026 00:00:00 +0000

Mathematical Foundations for Machine Learning #

Machine Learning is built on mathematical principles that allow models to:

represent data
learn patterns
optimise performance

flowchart LR
 DATA[Data]
 MATH[Math Models]
 OPT[Optimisation]
 MODEL[Trained Model]

 DATA --> MATH
 MATH --> OPT
 OPT --> MODEL

ML requires core mathematical tools to understand how ML algorithms work internally. Algebra deals with relationships between variables and quantities, while Calculus focuses on change and optimization.

Statistics

Thu, 12 Mar 2026 00:00:00 +0000

Statistics #

Statistical methods help you turn raw data into reliable conclusions, while understanding uncertainty, variability, and confidence.

Statistics provides the language and tools for reasoning about data, uncertainty, and inference.

ML needs understanding data behaviour, drawing conclusions, and validating machine learning models.

Collect Data
Present & Organise Data (in a systematic manner)
Alalyse Data
Infer about the Data
Take Decision from the Data

Statistics Topic	What you learn (plain English)	ML Connection
1. Basic Probability & Statistics	Summarise data; understand spread; basic probability rules	Data understanding (EDA), feature sanity checks, detecting outliers, interpreting “average behaviour”
2. Conditional Probability & Bayes	Update probability using new information; Bayes’ rule	Naïve Bayes, Bayesian thinking, posterior probabilities, probabilistic classification
3. Probability Distributions	Model randomness with distributions; expectation/variance/covariance	Likelihood models, noise assumptions (Gaussian), sampling, probabilistic modelling foundations
4. Hypothesis Testing	Sampling, CLT, confidence intervals, significance tests, ANOVA, MLE	A/B testing, evaluating model improvements, significance vs noise, parameter estimation (MLE)
5. Prediction & Forecasting	Correlation, regression, time series (AR/MA/ARIMA/SARIMA etc.)	Linear regression, forecasting, sequential data modelling, baseline predictive modelling
6. GMM & EM	Mixtures of Gaussians; iterative estimation with EM	Unsupervised learning (soft clustering), density estimation, latent-variable models

flowchart TD
 A["Statistical Methods<br/>AIML ZC418"] --> B["1. Basic Probability and Statistics"]
 A --> C["2. Conditional Probability and Bayes"]
 A --> D["3. Probability Distributions"]
 A --> E["4. Hypothesis Testing"]
 A --> F["5. Prediction and Forecasting"]
 A --> G["6. Gaussian Mixture Model and EM"]

 B --> B1["Central Tendency<br/>Mean - Median - Mode"]
 B --> B2["Variability<br/>Range - Variance - SD - Quartiles"]
 B --> B3["Basic Probability Concepts"]
 B3 --> B31["Axioms of Probability"]
 B3 --> B32["Definition of Probability"]
 B3 --> B33["Mutually Exclusive vs Independent"]

 C --> C1["Conditional Probability"]
 C --> C2["Independence (conditional)"]
 C --> C3["Bayes Theorem"]
 C --> C4["Naive Bayes (intro)"]

 D --> D1["Random Variables<br/>Discrete and Continuous"]
 D --> D2["Expectation - Variance - Covariance"]
 D --> D3["Transformations of RVs"]
 D --> D4["Key Distributions"]
 D4 --> D41["Bernoulli"]
 D4 --> D42["Binomial"]
 D4 --> D43["Poisson"]
 D4 --> D44["Normal (Gaussian)"]
 D4 --> D45["t - Chi-square - F (intro)"]

 E --> E1["Sampling<br/>Random and Stratified"]
 E --> E2["Sampling Distributions<br/>CLT"]
 E --> E3["Estimation<br/>Confidence Intervals"]
 E --> E4["Hypothesis Tests<br/>Means and Proportions"]
 E --> E5["ANOVA<br/>Single and Dual factor"]
 E --> E6["Maximum Likelihood"]

 F --> F1["Correlation"]
 F --> F2["Regression"]
 F --> F3["Time Series Basics<br/>Components"]
 F --> F4["Moving Averages<br/>Simple and Weighted"]
 F --> F5["Time Series Models"]
 F5 --> F51["AR"]
 F5 --> F52["ARMA / ARIMA"]
 F5 --> F53["SARIMA / SARIMAX"]
 F5 --> F54["VAR / VARMAX"]
 F --> F6["Exponential Smoothing"]

 G --> G1["GMM<br/>Mixture of Gaussians"]
 G --> G2["EM Algorithm<br/>E-step - M-step"]

 B -.-> C
 C -.-> D
 D -.-> E
 E -.-> F
 F -.-> G

Data - Types #

flowchart TD
	A[(Data)] --> B["Categorical (Qualitative)"]
 A --> C["Numerical (Quantitative)"]

 B --> B1[Nominal]
 B --> B2[Ordinal]

 C --> C1[Discrete]
 C --> C2[Continuous]

 C2 --> C21[Interval]
 C2 --> C22[Ratio]

 %% Styling
 style A fill:#E1F5FE,stroke:#333
 style B fill:#90CAF9,stroke:#333
 style B1 fill:#90CAF9,stroke:#333
 style B2 fill:#90CAF9,stroke:#333
 style C fill:#FFF9C4,stroke:#333
 style C1 fill:#FFF9C4,stroke:#333
 style C2 fill:#FFF9C4,stroke:#333
 style C21 fill:#FFF9C4,stroke:#333
 style C22 fill:#FFF9C4,stroke:#333

Categorical (Qualitative) #

express a qualitative attribute e.g. hair color, eye color

Principal Component Analysis (PCA)

Mon, 01 Jan 0001 00:00:00 +0000

Principal Component Analysis (PCA) #

dimensionality reduction technique
helps us to reduce the number of features in a dataset while keeping the most important information.
changes complex datasets by transforming correlated features into a smaller set of uncorrelated components.
uses linear algebra to transform data into new features called principal components.
finds these by calculating eigenvectors (directions) and eigenvalues (importance) from the covariance matrix.
PCA selects the top components with the highest eigenvalues and projects the data onto them simplify the dataset.

PCA prioritizes the directions where the data varies the most because more variation = more useful information.

Probability on Arshad Siddiqui

Formula Sheet

Formula Sheet #

Notation #

1. Basic Probability & Statistics #

1.1 Measures of Central Tendency #

Arithmetic mean #

Stats Formula Sheet

Stats Formula Sheet #

Notation #

Module 1: Basic Statistics #

Measures of Central Tendency #

Conditional Probability & Bayes’ Theorem

Conditional Probability & Bayes’ Theorem #

Quick summary #

What’s next #

Conditional Probability

Conditional Probability #

Prior vs posterior #

Bayes’ Theorem

Bayes’ Theorem #

2.1 Total probability (needed for Bayes) #

2.2 Bayes’ theorem (two-event form) #

Naïve Bayes

Naïve Bayes #

Probability Distributions

Probability Distributions #

AI/ML Connection #

Random Variables

Random Variables #

1) Definition #

Common Probability Distributions

Common Probability Distributions #

1) Bernoulli distribution (binary) #

Mathematical Foundation

Mathematical Foundations for Machine Learning #

Statistics

Statistics #

Data - Types #

Categorical (Qualitative) #

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) #