Common Probability Distributions #
Once you can describe a random variable using a pmf or pdf, the next step is to use named distributions that appear repeatedly in real data and in ML models.
Key takeaway: Named distributions give you ready-made probability models for common patterns: binary outcomes, counts, and measurement noise.
1) Bernoulli distribution (binary) #
Use when: one trial has two outcomes (success/failure).
Support: \( x\in\{0,1\} \)
\[ P(X=x)=p^x(1-p)^{1-x},\quad x\in\{0,1\} \]Mean and variance:
\[ E(X)=p,\quad V(X)=p(1-p) \]ML connection: binary labels, click/no-click, churn/no-churn.
2) Binomial distribution (number of successes in n trials) #
Use when:
- fixed number of independent trials \( n \)
- constant success probability \( p \)
- count successes \( X \)
Support: \( x=0,1,2,\dots,n \)
\[ P(X=x)=\binom{n}{x}p^x(1-p)^{n-x} \]Mean and variance:
\[ E(X)=np,\quad V(X)=np(1-p) \]ML connection: how many “positive” outcomes in n repeated trials (quality checks, conversions, etc.).
3) Poisson distribution (counts over time/space) #
Use when: you count events in a fixed window and events occur independently at an average rate.
Parameter: \( \lambda>0 \) (often written as \( m \) in some texts)
Support: \( x=0,1,2,\dots \)
\[ P(X=x)=e^{-\lambda}\frac{\lambda^x}{x!} \]Mean and variance:
\[ E(X)=\lambda,\quad V(X)=\lambda \]ML connection: call arrivals per minute, defects per metre, incidents per day.
4) Normal (Gaussian) distribution (measurement noise) #
Use when: values cluster around a mean with symmetric “bell-shaped” variation.
Parameters: mean \( \mu \) , standard deviation \( \sigma \)
\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]Standard normal: \( Z\sim N(0,1) \) .
Why Gaussian appears everywhere: Many small independent effects add up to something close to normal. In ML: Gaussian noise assumptions lead to squared-error loss.
5) t, Chi-square, F (intro) #
These distributions matter later for inference, but you should recognise where they come from.
5.1 t distribution (small-sample uncertainty in the mean) #
When sampling from a normal population with unknown variance, the statistic
\[ T=\frac{\bar{X}-\mu}{S/\sqrt{n}} \]follows a t distribution with \( n-1 \) degrees of freedom.
Key idea: t is like the normal distribution but with heavier tails (more uncertainty).
5.2 Chi-square distribution (variation / sum of squares) #
A chi-square distribution has one parameter: degrees of freedom \( \nu \) .
It is positive-valued and is built from sums of squared normal variables.
You will see it in: confidence intervals and tests about variance.
5.3 F distribution (ratio of variances) #
An F distribution is related to a ratio of two scaled chi-square variables.
You will see it in: ANOVA and comparing two variances.
6) Quick selection guide #
flowchart TD A[What type of data are you modelling?] --> B[Binary outcome] A --> C[Count of events] A --> D[Continuous measurement] B --> B1[Bernoulli (single trial)] B --> B2[Binomial (n trials)] C --> C1[Poisson (rate-based counts)] D --> D1[Normal (measurement noise)] D --> D2[t / Chi-square / F\nused in inference]
7) Summary table #
| Distribution | Typical data | Parameters | Support |
|---|---|---|---|
| Bernoulli | yes/no | p | 0, 1 |
| Binomial | successes in n trials | n, p | 0..n |
| Poisson | counts in a fixed window | λ | 0,1,2,… |
| Normal | continuous measurements | μ, σ | all real numbers |
| t | mean inference (small n) | df | all real numbers |
| Chi-square | variance / sums of squares | df | ≥ 0 |
| F | ratio of variances | df1, df2 | ≥ 0 |
References #
- Devore: Ch. 3 (Bernoulli, Binomial, Poisson) and Ch. 4 (Normal, Chi-square) plus later chapters for t and F
- ISM sessions: use this as the syllabus-aligned guide
Home | Probability Distributions