Stats Formula Sheet #
Keep this page as a quick reference of definitions + formulas.
Notation #
- Sample size: \( n \) (sample), \( N \) (population)
- Mean: \( \bar{x} \) (sample), \( \mu \) (population)
- Variance: \( s^2 \) (sample), \( \sigma^2 \) (population)
- Standard deviation: \( s \) (sample), \( \sigma \) (population)
Module 1: Basic Statistics #
Measures of Central Tendency #
Sample mean (ungrouped):
\[ \bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i \]Population mean:
\[ \mu=\frac{1}{N}\sum_{i=1}^{N}x_i \]Mean (grouped / frequency table): $m_i$ = class midpoint.
\[ \bar{x}=\frac{\sum_i f_i m_i}{\sum_i f_i} \]Median (ungrouped):
- Odd (n): middle value after sorting
- Even (n): average of the two middle values
Mode (ungrouped):
- value with the highest frequency
Empirical relationship (moderately skewed data):
\[ \text{Mode}\approx 3\,\text{Median}-2\,\text{Mean} \]Measures of Variability #
Range:
\[ \text{Range}=x_{\max}-x_{\min} \]Sum of squares (SS):
\[ SS=\sum_{i=1}^{n}(x_i-\bar{x})^2 \]Population variance and SD:
\[ \sigma^2=\frac{\sum_{i=1}^{N}(x_i-\mu)^2}{N},\qquad \sigma=\sqrt{\sigma^2} \]Sample variance and SD:
\[ s^2=\frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n-1}=\frac{SS}{n-1},\qquad s=\sqrt{s^2} \]Shortcut (sample or population):
\[ \operatorname{Var}(X)=E(X^2)-[E(X)]^2 \]Coefficient of variation (CV):
\[ CV=\frac{s}{\bar{x}}\times 100\% \]Five-number summary, IQR, and Outliers #
Five-number summary: (\min,; Q_1,; Q_2;(\text{median}),; Q_3,; \max)
Interquartile range (IQR):
\[ IQR=Q_3-Q_1 \]Quartile deviation (QD):
\[ QD=\frac{IQR}{2} \]Outlier fences (boxplot rule):
\[ \text{Lower fence}=Q_1-1.5\,IQR,\qquad \text{Upper fence}=Q_3+1.5\,IQR \]Major outlier fences (sometimes used):
\[ Q_1-3\,IQR,\qquad Q_3+3\,IQR \]Module 1: Basic Probability #
Axioms #
\[ P(S)=1,\qquad 0\le P(A)\le 1 \]If $A\cap B=\varnothing$ (mutually exclusive):
\[ P(A\cup B)=P(A)+P(B) \]Core rules #
Complement:
\[ P(A^c)=1-P(A) \]Addition rule (general):
\[ P(A\cup B)=P(A)+P(B)-P(A\cap B) \]Conditional probability + multiplication #
Conditional probability:
\[ P(A\mid B)=\frac{P(A\cap B)}{P(B)}\quad (P(B)>0) \]Multiplication rule (two events):
\[ P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A) \]Multiplication rule (three events):
\[ P(A\cap B\cap C)=P(A)\,P(B\mid A)\,P(C\mid A\cap B) \]Independence #
Events $A$ and $B$ are independent iff:
\[ P(A\cap B)=P(A)P(B) \]Equivalent tests (when defined):
\[ P(A\mid B)=P(A),\qquad P(B\mid A)=P(B) \]Module 2: Total Probability + Bayes’ Theorem #
Total probability #
If $E_1,\dots,E_n$ are mutually exclusive and exhaustive (a partition of $S$):
\[ P(A)=\sum_{i=1}^{n} P(A\mid E_i)\,P(E_i) \]Special case (two-way split):
\[ P(A)=P(A\mid B)P(B)+P(A\mid B^c)P(B^c) \]Bayes’ theorem (two events) #
\[ P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)} \]Bayes’ theorem (multiple hypotheses) #
\[ P(E_i\mid A)=\frac{P(A\mid E_i)P(E_i)}{\sum_{j=1}^{n} P(A\mid E_j)P(E_j)} \]Bayesian Learning #
Bayes for a hypothesis $h$ and data $D$:
\[ P(h\mid D)=\frac{P(D\mid h)P(h)}{P(D)} \]MAP hypothesis:
\[ h_{MAP}=\arg\max_{h\in H} P(h\mid D)=\arg\max_{h\in H} P(D\mid h)P(h) \]Maximum likelihood (uniform prior):
\[ h_{ML}=\arg\max_{h\in H} P(D\mid h) \]Naïve Bayes (Classifier) #
Conditional independence assumption #
\[ P(X_1,\dots,X_n\mid Y)=\prod_{j=1}^{n} P(X_j\mid Y) \]Decision rule (classification) #
\[ \hat{y}=\arg\max_{y} \; P(y)\prod_{j=1}^{n} P(x_j\mid y) \]Laplace smoothing (counts / text) #
Vocabulary size is $|V|$ and smoothing constant is $k$ (often 1).
\[ P(w\mid c)=\frac{\operatorname{count}(w,c)+k}{\operatorname{count}(c)+k|V|} \]Module 3: Random Variables #
RV + distribution functions #
Random variable (as a function):
\[ X:S\to\mathbb{R} \]Discrete pmf:
\[ p(x)=P(X=x),\qquad p(x)\ge 0,\qquad \sum_x p(x)=1 \]Continuous pdf:
\[ f(x)\ge 0,\qquad \int_{-\infty}^{\infty} f(x)\,dx=1 \]Interval probability:
\[ P(a\le X\le b)=\int_{a}^{b} f(x)\,dx \]CDF (both cases):
\[ F(x)=P(X\le x) \]Expectation and variance #
Expectation (discrete):
\[ E(X)=\sum_x x\,p(x) \]Expectation (continuous):
\[ E(X)=\int_{-\infty}^{\infty} x\,f(x)\,dx \]Variance (both cases):
\[ \operatorname{Var}(X)=E[(X-\mu)^2]=E(X^2)-[E(X)]^2 \]Rules:
\[ E(aX+b)=aE(X)+b \] \[ \operatorname{Var}(aX+b)=a^2\operatorname{Var}(X) \]Two Random Variables (Joint / Marginal / Conditional) #
Joint pmf (discrete):
\[ p(x,y)=P(X=x,\,Y=y) \]Marginals (discrete):
\[ p_X(x)=\sum_y p(x,y),\qquad p_Y(y)=\sum_x p(x,y) \]Conditional pmf (discrete):
\[ p_{Y\mid X}(y\mid x)=\frac{p(x,y)}{p_X(x)}\quad (p_X(x)>0) \]Independence (discrete):
\[ X\perp Y\iff p(x,y)=p_X(x)p_Y(y) \]Covariance + correlation:
\[ \operatorname{Cov}(X,Y)=E(XY)-E(X)E(Y) \] \[ \rho_{XY}=\frac{\operatorname{Cov}(X,Y)}{\sigma_X\sigma_Y} \]Common Distributions #
Bernoulli distribution #
Parameter: $p$
\[ P(X=1)=p,\quad P(X=0)=1-p \] \[ E(X)=p,\quad \operatorname{Var}(X)=p(1-p) \]Binomial distribution #
Parameters: $n, p$
\[ P(X=k)=\binom{n}{k}p^k(1-p)^{n-k},\quad k=0,1,\dots,n \] \[ E(X)=np,\quad \operatorname{Var}(X)=np(1-p) \]Poisson distribution #
Parameter: $\lambda$
\[ P(X=k)=e^{-\lambda}\frac{\lambda^k}{k!},\quad k=0,1,2,\dots \] \[ E(X)=\lambda,\quad \operatorname{Var}(X)=\lambda \]Normal distribution #
Parameters: $\mu, \sigma^2$
\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]Standardisation:
\[ Z=\frac{X-\mu}{\sigma}\sim N(0,1) \]