Common Probability Distributions

Common Probability Distributions #

Once you can describe a random variable using a pmf or pdf, the next step is to use named distributions that appear repeatedly in real data and in ML models.

Key takeaway: Named distributions give you ready-made probability models for common patterns: binary outcomes, counts, and measurement noise.


1) Bernoulli distribution (binary) #

Use when: one trial has two outcomes (success/failure).

Support: \( x\in\{0,1\} \)

\[ P(X=x)=p^x(1-p)^{1-x},\quad x\in\{0,1\} \]

Mean and variance:

\[ E(X)=p,\quad V(X)=p(1-p) \]

ML connection: binary labels, click/no-click, churn/no-churn.


2) Binomial distribution (number of successes in n trials) #

Use when:

  • fixed number of independent trials \( n \)
  • constant success probability \( p \)
  • count successes \( X \)

Support: \( x=0,1,2,\dots,n \)

\[ P(X=x)=\binom{n}{x}p^x(1-p)^{n-x} \]

Mean and variance:

\[ E(X)=np,\quad V(X)=np(1-p) \]

ML connection: how many “positive” outcomes in n repeated trials (quality checks, conversions, etc.).


3) Poisson distribution (counts over time/space) #

Use when: you count events in a fixed window and events occur independently at an average rate.

Parameter: \( \lambda>0 \) (often written as \( m \) in some texts)

Support: \( x=0,1,2,\dots \)

\[ P(X=x)=e^{-\lambda}\frac{\lambda^x}{x!} \]

Mean and variance:

\[ E(X)=\lambda,\quad V(X)=\lambda \]

ML connection: call arrivals per minute, defects per metre, incidents per day.


4) Normal (Gaussian) distribution (measurement noise) #

Use when: values cluster around a mean with symmetric “bell-shaped” variation.

Parameters: mean \( \mu \) , standard deviation \( \sigma \)

\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]

Standard normal: \( Z\sim N(0,1) \) .

Why Gaussian appears everywhere: Many small independent effects add up to something close to normal. In ML: Gaussian noise assumptions lead to squared-error loss.


5) t, Chi-square, F (intro) #

These distributions matter later for inference, but you should recognise where they come from.

5.1 t distribution (small-sample uncertainty in the mean) #

When sampling from a normal population with unknown variance, the statistic

\[ T=\frac{\bar{X}-\mu}{S/\sqrt{n}} \]

follows a t distribution with \( n-1 \) degrees of freedom.

Key idea: t is like the normal distribution but with heavier tails (more uncertainty).


5.2 Chi-square distribution (variation / sum of squares) #

A chi-square distribution has one parameter: degrees of freedom \( \nu \) .

It is positive-valued and is built from sums of squared normal variables.

You will see it in: confidence intervals and tests about variance.


5.3 F distribution (ratio of variances) #

An F distribution is related to a ratio of two scaled chi-square variables.

You will see it in: ANOVA and comparing two variances.


6) Quick selection guide #

flowchart TD
  A[What type of data are you modelling?] --> B[Binary outcome]
  A --> C[Count of events]
  A --> D[Continuous measurement]

  B --> B1[Bernoulli (single trial)]
  B --> B2[Binomial (n trials)]

  C --> C1[Poisson (rate-based counts)]

  D --> D1[Normal (measurement noise)]
  D --> D2[t / Chi-square / F\nused in inference]

7) Summary table #

DistributionTypical dataParametersSupport
Bernoulliyes/nop0, 1
Binomialsuccesses in n trialsn, p0..n
Poissoncounts in a fixed windowλ0,1,2,…
Normalcontinuous measurementsμ, σall real numbers
tmean inference (small n)dfall real numbers
Chi-squarevariance / sums of squaresdf≥ 0
Fratio of variancesdf1, df2≥ 0

References #

  • Devore: Ch. 3 (Bernoulli, Binomial, Poisson) and Ch. 4 (Normal, Chi-square) plus later chapters for t and F
  • ISM sessions: use this as the syllabus-aligned guide

Home | Probability Distributions