Stats Formula Sheet #

Keep this page as a quick reference of definitions + formulas.

Notation #

Sample size: $ n $ (sample), $ N $ (population)
Mean: $ \bar{x} $ (sample), $ \mu $ (population)
Variance: $ s^2 $ (sample), $ \sigma^2 $ (population)
Standard deviation: $ s $ (sample), $ \sigma $ (population)

Module 1: Basic Statistics #

Measures of Central Tendency #

Sample mean (ungrouped):

\[ \bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i \]

Population mean:

\[ \mu=\frac{1}{N}\sum_{i=1}^{N}x_i \]

Mean (grouped / frequency table): $m_i$ = class midpoint.

\[ \bar{x}=\frac{\sum_i f_i m_i}{\sum_i f_i} \]

Median (ungrouped):

Odd (n): middle value after sorting
Even (n): average of the two middle values

Mode (ungrouped):

value with the highest frequency

Empirical relationship (moderately skewed data):

\[ \text{Mode}\approx 3\,\text{Median}-2\,\text{Mean} \]

Measures of Variability #

Range:

\[ \text{Range}=x_{\max}-x_{\min} \]

Sum of squares (SS):

\[ SS=\sum_{i=1}^{n}(x_i-\bar{x})^2 \]

Population variance and SD:

\[ \sigma^2=\frac{\sum_{i=1}^{N}(x_i-\mu)^2}{N},\qquad \sigma=\sqrt{\sigma^2} \]

Sample variance and SD:

\[ s^2=\frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n-1}=\frac{SS}{n-1},\qquad s=\sqrt{s^2} \]

Shortcut (sample or population):

\[ \operatorname{Var}(X)=E(X^2)-[E(X)]^2 \]

Coefficient of variation (CV):

\[ CV=\frac{s}{\bar{x}}\times 100\% \]

Five-number summary, IQR, and Outliers #

Five-number summary: (\min,; Q_1,; Q_2;(\text{median}),; Q_3,; \max)

Interquartile range (IQR):

\[ IQR=Q_3-Q_1 \]

Quartile deviation (QD):

\[ QD=\frac{IQR}{2} \]

Outlier fences (boxplot rule):

\[ \text{Lower fence}=Q_1-1.5\,IQR,\qquad \text{Upper fence}=Q_3+1.5\,IQR \]

Major outlier fences (sometimes used):

\[ Q_1-3\,IQR,\qquad Q_3+3\,IQR \]

Module 1: Basic Probability #

Axioms #

\[ P(S)=1,\qquad 0\le P(A)\le 1 \]

If $A\cap B=\varnothing$ (mutually exclusive):

\[ P(A\cup B)=P(A)+P(B) \]

Core rules #

Complement:

\[ P(A^c)=1-P(A) \]

Addition rule (general):

\[ P(A\cup B)=P(A)+P(B)-P(A\cap B) \]

Conditional probability + multiplication #

Conditional probability:

\[ P(A\mid B)=\frac{P(A\cap B)}{P(B)}\quad (P(B)>0) \]

Multiplication rule (two events):

\[ P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A) \]

Multiplication rule (three events):

\[ P(A\cap B\cap C)=P(A)\,P(B\mid A)\,P(C\mid A\cap B) \]

Independence #

Events $A$ and $B$ are independent iff:

\[ P(A\cap B)=P(A)P(B) \]

Equivalent tests (when defined):

\[ P(A\mid B)=P(A),\qquad P(B\mid A)=P(B) \]

Module 2: Total Probability + Bayes’ Theorem #

Total probability #

If $E_1,\dots,E_n$ are mutually exclusive and exhaustive (a partition of $S$):

\[ P(A)=\sum_{i=1}^{n} P(A\mid E_i)\,P(E_i) \]

Special case (two-way split):

\[ P(A)=P(A\mid B)P(B)+P(A\mid B^c)P(B^c) \]

Bayes’ theorem (two events) #

\[ P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)} \]

Bayes’ theorem (multiple hypotheses) #

\[ P(E_i\mid A)=\frac{P(A\mid E_i)P(E_i)}{\sum_{j=1}^{n} P(A\mid E_j)P(E_j)} \]

Bayesian Learning #

Bayes for a hypothesis $h$ and data $D$:

\[ P(h\mid D)=\frac{P(D\mid h)P(h)}{P(D)} \]

MAP hypothesis:

\[ h_{MAP}=\arg\max_{h\in H} P(h\mid D)=\arg\max_{h\in H} P(D\mid h)P(h) \]

Maximum likelihood (uniform prior):

\[ h_{ML}=\arg\max_{h\in H} P(D\mid h) \]

Naïve Bayes (Classifier) #

Conditional independence assumption #

\[ P(X_1,\dots,X_n\mid Y)=\prod_{j=1}^{n} P(X_j\mid Y) \]

Decision rule (classification) #

\[ \hat{y}=\arg\max_{y} \; P(y)\prod_{j=1}^{n} P(x_j\mid y) \]

Laplace smoothing (counts / text) #

Vocabulary size is $|V|$ and smoothing constant is $k$ (often 1).

\[ P(w\mid c)=\frac{\operatorname{count}(w,c)+k}{\operatorname{count}(c)+k|V|} \]

Module 3: Random Variables #

RV + distribution functions #

Random variable (as a function):

\[ X:S\to\mathbb{R} \]

Discrete pmf:

\[ p(x)=P(X=x),\qquad p(x)\ge 0,\qquad \sum_x p(x)=1 \]

Continuous pdf:

\[ f(x)\ge 0,\qquad \int_{-\infty}^{\infty} f(x)\,dx=1 \]

Interval probability:

\[ P(a\le X\le b)=\int_{a}^{b} f(x)\,dx \]

CDF (both cases):

\[ F(x)=P(X\le x) \]

Expectation and variance #

Expectation (discrete):

\[ E(X)=\sum_x x\,p(x) \]

Expectation (continuous):

\[ E(X)=\int_{-\infty}^{\infty} x\,f(x)\,dx \]

Variance (both cases):

\[ \operatorname{Var}(X)=E[(X-\mu)^2]=E(X^2)-[E(X)]^2 \]

Rules:

\[ E(aX+b)=aE(X)+b \] \[ \operatorname{Var}(aX+b)=a^2\operatorname{Var}(X) \]

Two Random Variables (Joint / Marginal / Conditional) #

Joint pmf (discrete):

\[ p(x,y)=P(X=x,\,Y=y) \]

Marginals (discrete):

\[ p_X(x)=\sum_y p(x,y),\qquad p_Y(y)=\sum_x p(x,y) \]

Conditional pmf (discrete):

\[ p_{Y\mid X}(y\mid x)=\frac{p(x,y)}{p_X(x)}\quad (p_X(x)>0) \]

Independence (discrete):

\[ X\perp Y\iff p(x,y)=p_X(x)p_Y(y) \]

Covariance + correlation:

\[ \operatorname{Cov}(X,Y)=E(XY)-E(X)E(Y) \] \[ \rho_{XY}=\frac{\operatorname{Cov}(X,Y)}{\sigma_X\sigma_Y} \]

Common Distributions #

Bernoulli distribution #

Parameter: $p$

\[ P(X=1)=p,\quad P(X=0)=1-p \] \[ E(X)=p,\quad \operatorname{Var}(X)=p(1-p) \]

Binomial distribution #

Parameters: $n, p$

\[ P(X=k)=\binom{n}{k}p^k(1-p)^{n-k},\quad k=0,1,\dots,n \] \[ E(X)=np,\quad \operatorname{Var}(X)=np(1-p) \]

Poisson distribution #

Parameter: $\lambda$

\[ P(X=k)=e^{-\lambda}\frac{\lambda^k}{k!},\quad k=0,1,2,\dots \] \[ E(X)=\lambda,\quad \operatorname{Var}(X)=\lambda \]

Normal distribution #

Parameters: $\mu, \sigma^2$

\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]

Standardisation:

\[ Z=\frac{X-\mu}{\sigma}\sim N(0,1) \]

Home | Statistics