Common Probability Distributions

Common Probability Distributions #

Once you can describe a random variable using a pmf or pdf, the next step is to use named distributions that appear repeatedly in real data and in ML models.

Key takeaway: Named distributions give you ready-made probability models for common patterns: binary outcomes, counts, and measurement noise.


flowchart TD
PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]

DS --> DIS["Discrete"]
DS --> CON["Continuous"]

DIS --> D1["Bernoulli"]
DIS --> D2["Binomial"]
DIS --> D3["Poisson"]

CON --> D4["Normal<br/>(Gaussian)"]
CON --> D5["t / Chi-square / F<br/>(intro)"]

style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style DS fill:#90CAF9,stroke:#1E88E5,color:#000

style DIS fill:#CE93D8,stroke:#8E24AA,color:#000
style CON fill:#CE93D8,stroke:#8E24AA,color:#000

style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D3 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D4 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D5 fill:#C8E6C9,stroke:#2E7D32,color:#000

1) Bernoulli distribution (binary) #

Use when: one trial has two outcomes (success/failure).

Support: \( x\in\{0,1\} \)

\[ P(X=x)=p^x(1-p)^{1-x},\quad x\in\{0,1\} \]

Mean and variance:

\[ E(X)=p,\quad V(X)=p(1-p) \]

ML connection: binary labels, click/no-click, churn/no-churn.


2) Binomial distribution (number of successes in n trials) #

Use when:

  • fixed number of independent trials \( n \)
  • constant success probability \( p \)
  • count successes \( X \)

Support: \( x=0,1,2,\dots,n \)

\[ P(X=x)=\binom{n}{x}p^x(1-p)^{n-x} \]

Mean and variance:

\[ E(X)=np,\quad V(X)=np(1-p) \]

ML connection: how many “positive” outcomes in n repeated trials (quality checks, conversions, etc.)

2.1 Binomial Distribution: Understanding #

Phrase in questionsMeaning in mathsExample for a Binomial RV $X$
“at least”$X \ge k$$P(X \ge 3)$
“more than” / “greater than”$X > k$$P(X > 3)=P(X \ge 4)$
“fewer than” / “less than”$X < k$$P(X < 3)=P(X \le 2)$
“no more than” / “at most”$X \le k$$P(X \le 3)$
“exactly”$X = k$$P(X = 3)$

3) Poisson distribution (counts over time/space) #

Use when: you count events in a fixed window and events occur independently at an average rate.

Parameter: \( \lambda>0 \) (often written as \( m \) in some texts)

Support: \( x=0,1,2,\dots \)

\[ P(X=x)=e^{-\lambda}\frac{\lambda^x}{x!} \]

Mean and variance:

\[ E(X)=\lambda,\quad V(X)=\lambda \]

ML connection: call arrivals per minute, defects per metre, incidents per day.


4) Normal (Gaussian) distribution (measurement noise) #

Use when: values cluster around a mean with symmetric “bell-shaped” variation.

Parameters: mean \( \mu \) , standard deviation \( \sigma \)

\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]

Standard normal: \( Z\sim N(0,1) \) .

Why Gaussian appears everywhere: Many small independent effects add up to something close to normal. In ML: Gaussian noise assumptions lead to squared-error loss.


5) t, Chi-square, F (intro) #

These distributions matter later for inference, but you should recognise where they come from.

5.1 t distribution (small-sample uncertainty in the mean) #

When sampling from a normal population with unknown variance, the statistic

\[ T=\frac{\bar{X}-\mu}{S/\sqrt{n}} \]

follows a t distribution with \( n-1 \) degrees of freedom.

Key idea: t is like the normal distribution but with heavier tails (more uncertainty).


5.2 Chi-square distribution (variation / sum of squares) #

A chi-square distribution has one parameter: degrees of freedom \( \nu \) .

It is positive-valued and is built from sums of squared normal variables.

You will see it in: confidence intervals and tests about variance.


5.3 F distribution (ratio of variances) #

An F distribution is related to a ratio of two scaled chi-square variables.

You will see it in: ANOVA and comparing two variances.


6) Quick selection guide #

flowchart TD
A["What type of data<br/>are you modelling?"] --> B["Binary outcome"]
A --> C["Count of events"]
A --> D["Continuous measurement"]

B --> B1["Bernoulli<br/>(single trial)"]
B --> B2["Binomial<br/>(n trials)"]

C --> C1["Poisson<br/>(rate-based counts)"]

D --> D1["Normal<br/>(measurement noise)"]
D --> D2["t / Chi-square / F<br/>used in inference"]

style A fill:#90CAF9,stroke:#1E88E5,color:#000
style B fill:#CE93D8,stroke:#8E24AA,color:#000
style C fill:#CE93D8,stroke:#8E24AA,color:#000
style D fill:#CE93D8,stroke:#8E24AA,color:#000

style B1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style B2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style C1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000

7) Summary table #

DistributionTypical dataParametersSupport
Bernoulliyes/nop0, 1
Binomialsuccesses in n trialsn, p0..n
Poissoncounts in a fixed windowλ0,1,2,…
Normalcontinuous measurementsμ, σall real numbers
tmean inference (small n)dfall real numbers
Chi-squarevariance / sums of squaresdf≥ 0
Fratio of variancesdf1, df2≥ 0

References #

  • Devore: Ch. 3 (Bernoulli, Binomial, Poisson) and Ch. 4 (Normal, Chi-square) plus later chapters for t and F
  • ISM sessions: use this as the syllabus-aligned guide

Home | Probability Distributions