Common Probability Distributions #

Once you can describe a random variable using a pmf or pdf, the next step is to use named distributions that appear repeatedly in real data and in ML models.

Key takeaway: Named distributions give you ready-made probability models for common patterns: binary outcomes, counts, and measurement noise.

flowchart TD
PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]

DS --> DIS["Discrete"]
DS --> CON["Continuous"]

DIS --> D1["Bernoulli"]
DIS --> D2["Binomial"]
DIS --> D3["Poisson"]

CON --> D4["Normal<br/>(Gaussian)"]
CON --> D5["t / Chi-square / F<br/>(intro)"]

style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style DS fill:#90CAF9,stroke:#1E88E5,color:#000

style DIS fill:#CE93D8,stroke:#8E24AA,color:#000
style CON fill:#CE93D8,stroke:#8E24AA,color:#000

style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D3 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D4 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D5 fill:#C8E6C9,stroke:#2E7D32,color:#000

1) Bernoulli distribution (binary) #

Use when: one trial has two outcomes (success/failure).

Support: $ x\in\{0,1\} $

\[ P(X=x)=p^x(1-p)^{1-x},\quad x\in\{0,1\} \]

Mean and variance:

\[ E(X)=p,\quad V(X)=p(1-p) \]

ML connection: binary labels, click/no-click, churn/no-churn.

2) Binomial distribution (number of successes in n trials) #

Use when:

fixed number of independent trials $ n $
constant success probability $ p $
count successes $ X $

Support: $ x=0,1,2,\dots,n $

\[ P(X=x)=\binom{n}{x}p^x(1-p)^{n-x} \]

Mean and variance:

\[ E(X)=np,\quad V(X)=np(1-p) \]

ML connection: how many “positive” outcomes in n repeated trials (quality checks, conversions, etc.)

2.1 Binomial Distribution: Understanding #

Phrase in questions	Meaning in maths	Example for a Binomial RV $X$
“at least”	$X \ge k$	$P(X \ge 3)$
“more than” / “greater than”	$X > k$	$P(X > 3)=P(X \ge 4)$
“fewer than” / “less than”	$X < k$	$P(X < 3)=P(X \le 2)$
“no more than” / “at most”	$X \le k$	$P(X \le 3)$
“exactly”	$X = k$	$P(X = 3)$

3) Poisson distribution (counts over time/space) #

Use when: you count events in a fixed window and events occur independently at an average rate.

Parameter: $ \lambda>0 $ (often written as $ m $ in some texts)

Support: $ x=0,1,2,\dots $

\[ P(X=x)=e^{-\lambda}\frac{\lambda^x}{x!} \]

Mean and variance:

\[ E(X)=\lambda,\quad V(X)=\lambda \]

ML connection: call arrivals per minute, defects per metre, incidents per day.

4) Normal (Gaussian) distribution (measurement noise) #

Use when: values cluster around a mean with symmetric “bell-shaped” variation.

Parameters: mean $ \mu $ , standard deviation $ \sigma $

\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]

Standard normal: $ Z\sim N(0,1) $ .

Why Gaussian appears everywhere: Many small independent effects add up to something close to normal. In ML: Gaussian noise assumptions lead to squared-error loss.

5) t, Chi-square, F (intro) #

These distributions matter later for inference, but you should recognise where they come from.

5.1 t distribution (small-sample uncertainty in the mean) #

When sampling from a normal population with unknown variance, the statistic

\[ T=\frac{\bar{X}-\mu}{S/\sqrt{n}} \]

follows a t distribution with $ n-1 $ degrees of freedom.

Key idea: t is like the normal distribution but with heavier tails (more uncertainty).

5.2 Chi-square distribution (variation / sum of squares) #

A chi-square distribution has one parameter: degrees of freedom $ \nu $ .

It is positive-valued and is built from sums of squared normal variables.

You will see it in: confidence intervals and tests about variance.

5.3 F distribution (ratio of variances) #

An F distribution is related to a ratio of two scaled chi-square variables.

You will see it in: ANOVA and comparing two variances.

6) Quick selection guide #

flowchart TD
A["What type of data<br/>are you modelling?"] --> B["Binary outcome"]
A --> C["Count of events"]
A --> D["Continuous measurement"]

B --> B1["Bernoulli<br/>(single trial)"]
B --> B2["Binomial<br/>(n trials)"]

C --> C1["Poisson<br/>(rate-based counts)"]

D --> D1["Normal<br/>(measurement noise)"]
D --> D2["t / Chi-square / F<br/>used in inference"]

style A fill:#90CAF9,stroke:#1E88E5,color:#000
style B fill:#CE93D8,stroke:#8E24AA,color:#000
style C fill:#CE93D8,stroke:#8E24AA,color:#000
style D fill:#CE93D8,stroke:#8E24AA,color:#000

style B1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style B2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style C1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000

7) Summary table #

Distribution	Typical data	Parameters	Support
Bernoulli	yes/no	p	0, 1
Binomial	successes in n trials	n, p	0..n
Poisson	counts in a fixed window	λ	0,1,2,…
Normal	continuous measurements	μ, σ	all real numbers
t	mean inference (small n)	df	all real numbers
Chi-square	variance / sums of squares	df	≥ 0
F	ratio of variances	df1, df2	≥ 0

References #

Devore: Ch. 3 (Bernoulli, Binomial, Poisson) and Ch. 4 (Normal, Chi-square) plus later chapters for t and F
ISM sessions: use this as the syllabus-aligned guide

Home | Probability Distributions