Formula Sheet

Formula Sheet #

This page is a quick reference of definitions + formulas, grouped by the modules.


Notation #

  • Sample size: \( n \) (sample), \( N \) (population)
  • Sample mean: \( \bar{x} \) , population mean: \( \mu \)
  • Sample variance: \( s^2 \) , population variance: \( \sigma^2 \)
  • Sample SD: \( s \) , population SD: \( \sigma \)
  • Complement: \( A^c \)
  • Intersection (“and”): \( A\cap B \) , union (“or”): \( A\cup B \)
  • Conditional probability: \( P(A\mid B) \)

1. Basic Probability & Statistics #

1.1 Measures of Central Tendency #

Arithmetic mean #

Sample mean (ungrouped):

\[ \bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i \]

Population mean:

\[ \mu=\frac{1}{N}\sum_{i=1}^{N}x_i \]

Grouped data mean (frequency table, \( m_i \) = class midpoint):

\[ \bar{x}=\frac{\sum_i f_i m_i}{\sum_i f_i} \]

Median (location) #

  • If \( n \) is odd: median is the \( \left(\frac{n+1}{2}\right) \) -th value (after sorting)
  • If \( n \) is even: median is the average of the \( \left(\frac{n}{2}\right) \) -th and \( \left(\frac{n}{2}+1\right) \) -th values

Mode #

  • Mode = the value with the highest frequency
  • If multiple values tie for highest frequency → multimodal

Other means (useful in some datasets) #

Geometric mean:

\[ \bar{x}_G=\left(\prod_{i=1}^{n}x_i\right)^{1/n} \]

Weighted mean:

\[ \bar{x}_W=\frac{\sum_i w_i x_i}{\sum_i w_i} \]

Harmonic mean:

\[ \bar{x}_H=\frac{n}{\sum_{i=1}^{n}\frac{1}{x_i}} \]

Root mean square:

\[ x_{rms}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}x_i^2} \]

1.2 Measures of Variability #

Range #

\[ \text{Range}=x_{max}-x_{min} \]

Midrange:

\[ \text{MidRange}=\frac{x_{max}+x_{min}}{2} \]

Variance and standard deviation #

Population variance and SD:

\[ \sigma^2=\frac{\sum_{i=1}^{N}(x_i-\mu)^2}{N}, \qquad \sigma=\sqrt{\sigma^2} \]

Sample variance and SD:

\[ s^2=\frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n-1}, \qquad s=\sqrt{s^2} \]

Coefficient of variation (CV) #

\[ CV=\frac{\text{SD}}{\text{Mean}}\times 100\% \]

Covariance #

Sample covariance:

\[ s_{xy}=\frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{n-1} \]

Population covariance:

\[ \sigma_{xy}=\frac{\sum_{i=1}^{N}(x_i-\mu_x)(y_i-\mu_y)}{N} \]

1.2.1 Five-number summary, IQR, and outliers #

Five-number summary:

  • minimum
  • \( Q_1 \)
  • median ( \( Q_2 \) )
  • \( Q_3 \)
  • maximum

Interquartile range:

\[ IQR=Q_3-Q_1 \]

Quartile deviation:

\[ QD=\frac{Q_3-Q_1}{2}=\frac{IQR}{2} \]

Coefficient of quartile deviation:

\[ CQD=\frac{Q_3-Q_1}{Q_3+Q_1} \]

Outlier fences (boxplot rule):

\[ \text{Lower fence}=Q_1-1.5\,IQR, \qquad \text{Upper fence}=Q_3+1.5\,IQR \]

Rule: any point outside \( [Q_1-1.5IQR,\;Q_3+1.5IQR] \) is an outlier.

Quartiles, percentiles, deciles (position formulas) #

Quartile position:

\[ Q_k=\frac{k(n+1)}{4}\text{-th term} \qquad (k=1,2,3) \]

Percentile position:

\[ P_k=\frac{k(n+1)}{100}\text{-th term} \]

Decile position:

\[ D_k=\frac{k(n+1)}{10}\text{-th term} \]

1.2.2 Grouped data (frequency table) formulas #

Grouped data standard deviation (sample form):

\[ s=\sqrt{\frac{\sum f(m-\bar{x})^2}{n-1}} \]

Grouped data shortcut for variance (sample form):

\[ s^2=\frac{\sum f m^2-\frac{(\sum f m)^2}{n}}{n-1} \]

Quartiles for grouped data (common form):

  • \( L \) = lower class boundary of quartile class
  • \( w \) = class width
  • \( f \) = frequency of quartile class
  • \( C \) = cumulative frequency before quartile class

First quartile:

\[ Q_1=L+\frac{w}{f}\left(\frac{n}{4}-C\right) \]

Median (second quartile):

\[ Q_2=L+\frac{w}{f}\left(\frac{n}{2}-C\right) \]

Third quartile:

\[ Q_3=L+\frac{w}{f}\left(\frac{3n}{4}-C\right) \]

Mode (grouped):

\[ \text{Mode}=L+h\left(\frac{f_m-f_1}{2f_m-f_1-f_2}\right) \]

1.3 Basic Probability concepts #

1.3.1 Axioms and basic rules #

Probability bounds:

\[ 0\le P(A)\le 1 \]

Complement:

\[ P(A^c)=1-P(A) \]

Addition rule (general):

\[ P(A\cup B)=P(A)+P(B)-P(A\cap B) \]

Mutually exclusive events: if \( A\cap B=\varnothing \) , then

\[ P(A\cup B)=P(A)+P(B) \]

Independent events:

\[ P(A\cap B)=P(A)\,P(B) \]

1.3.2 Counting (permutations and combinations) #

Combinations:

\[ {N\choose n}=\frac{N!}{n!(N-n)!} \]

Permutations:

\[ {}^NP_n=\frac{N!}{(N-n)!} \]

2. Conditional Probability & Bayes theorem #

2.1 Conditional probability #

\[ P(A\mid B)=\frac{P(A\cap B)}{P(B)}, \qquad P(B)>0 \]

Also:

\[ P(B\mid A)=\frac{P(A\cap B)}{P(A)}, \qquad P(A)>0 \]

2.2 Multiplication rule (joint probability) #

\[ P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A) \]

Chain rule (three events):

\[ P(A\cap B\cap C)=P(A)\,P(B\mid A)\,P(C\mid A\cap B) \]

2.2 Conditional probability for independent events #

If \( A \) and \( B \) are independent:

\[ P(A\mid B)=P(A), \qquad P(B\mid A)=P(B) \]

2.3 Bayes’ theorem #

Two-event Bayes:

\[ P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)} \]

Bayes with partition (hypotheses \( A_1,\dots,A_n \) are mutually exclusive and exhaustive):

\[ P(A_i\mid B)=\frac{P(A_i)P(B\mid A_i)}{\sum_{j=1}^{n}P(A_j)P(B\mid A_j)} \]

3. Probability Distributions #

3.1 Random Variables #

Random variable:

\[ X:S\to\mathbb{R} \]

Discrete PMF #

\[ p(x)=P(X=x),\qquad \sum_x p(x)=1 \]

Continuous PDF #

\[ f(x)\ge 0,\qquad \int_{-\infty}^{\infty} f(x)\,dx=1 \]

Interval probability (continuous):

\[ P(a\le X\le b)=\int_{a}^{b} f(x)\,dx \]

CDF (both cases) #

\[ F(x)=P(X\le x) \]

3.1.3 Mean, Variance, Co-Variance of Random variables #

Expected value (discrete):

\[ E(X)=\sum_x x\,p(x) \]

Expected value (continuous):

\[ E(X)=\int_{-\infty}^{\infty} x\,f(x)\,dx \]

Variance:

\[ \operatorname{Var}(X)=E[(X-\mu)^2]=E(X^2)-[E(X)]^2 \]

Covariance:

\[ \operatorname{Cov}(X,Y)=E(XY)-E(X)E(Y) \]

3.2 Probability Distributions #

3.2.1 Bernoulli Distribution #

If \( X\sim \text{Bernoulli}(p) \) :

\[ P(X=1)=p,\qquad P(X=0)=1-p \] \[ E(X)=p,\qquad \operatorname{Var}(X)=p(1-p) \]

3.2.2 Binomial Distribution #

If \( X\sim \text{Binomial}(n,p) \) :

\[ P(X=x)=\binom{n}{x}p^x(1-p)^{n-x},\qquad x=0,1,\dots,n \]

Mean and variance:

\[ E(X)=np,\qquad \operatorname{Var}(X)=np(1-p) \]

3.2.3 Poisson Distribution #

If \( X\sim \text{Poisson}(\lambda) \) :

\[ P(X=x)=\frac{\lambda^x e^{-\lambda}}{x!},\qquad x=0,1,2,\dots \]

Mean and variance:

\[ E(X)=\lambda,\qquad \operatorname{Var}(X)=\lambda \]

3.2.4 Normal (Gaussian) Distribution #

If \( X\sim N(\mu,\sigma^2) \) :

\[ f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \]

Standardisation:

\[ Z=\frac{X-\mu}{\sigma}\sim N(0,1) \]

Z conversion:

\[ z=\frac{x-\mu}{\sigma}, \qquad x=\mu+z\sigma \]

3.2.5 t, Chi-square, F (intro) #

This course introduces these distributions.

Quick reminders:

  • \( t \) distribution: mean-based inference when \( \sigma \) is unknown (small-sample setting)
  • \( \chi^2 \) distribution: variance-related inference and goodness-of-fit
  • \( F \) distribution: ratio of variances; used in ANOVA

4. Hypothesis Testing #

4.1 Sampling and sampling distributions #

Standard error of the mean:

\[ \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}} \]

4.2 Central Limit Theorem (core identities) #

Mean and SD of sampling distribution of sample mean:

\[ E(\bar{x})=\mu, \qquad \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}} \]

4.3 Estimation (confidence intervals) #

CI for population mean (sigma known):

\[ \mu\in \bar{x}\pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]

CI for population proportion:

\[ p\in \hat{p}\pm Z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

4.4 Testing of hypothesis #

Mean based (one-sample Z form):

\[ Z=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}} \]

Proportion related (one proportion Z form):

\[ Z=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]

4.5 Maximum likelihood (MLE) #

Likelihood (IID):

\[ L(\theta)=\prod_{i=1}^{n} f(x_i\mid \theta) \]

Log-likelihood:

\[ \ell(\theta)=\sum_{i=1}^{n}\log f(x_i\mid \theta) \]

MLE:

\[ \hat{\theta}_{MLE}=\arg\max_{\theta} L(\theta)=\arg\max_{\theta}\ell(\theta) \]

Extras (useful in practice) #

These formulas appear in the provided sheets and are useful for extra practice.

Uniform distribution #

If \( X\sim U(a,b) \) :

\[ f(x)=\frac{1}{b-a},\qquad a\le x\le b \] \[ P(c\le X\le d)=\frac{d-c}{b-a} \] \[ E(X)=\frac{a+b}{2},\qquad \sigma=\frac{b-a}{\sqrt{12}} \]

Exponential distribution #

If \( X\sim \text{Exponential}(\lambda) \) :

\[ f(x)=\lambda e^{-\lambda x},\qquad x\ge 0 \] \[ P(X\le n)=1-e^{-\lambda n},\qquad P(X\ge n)=e^{-\lambda n} \] \[ E(X)=\frac{1}{\lambda},\qquad \operatorname{Var}(X)=\frac{1}{\lambda^2} \]

Home | Statistics