Hypothesis Testing

Hypothesis Testing #

Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.

It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.

Key takeaway:
Hypothesis testing is not about proving something with certainty.

It is about asking:

If the null hypothesis were true, how surprising would this sample result be?


Where This Fits in the Course ☆ #

This topic belongs to Module 4: Hypothesis Testing.

It covers:

  • Sampling: random and stratified sampling
  • Sampling distribution
  • Central Limit Theorem
  • Interval estimation and confidence level
  • Testing of hypothesis
  • Mean-based tests
  • Proportion-based tests
  • ANOVA: single factor and dual factor
  • Maximum likelihood

Big Picture #

Hypothesis testing follows a repeatable workflow.

flowchart LR
    A[Research Question] --> B[Set H0 and H1]
    B --> C[Choose Test]
    C --> D[Compute Test Statistic]
    D --> E[Find p-value or Critical Value]
    E --> F[Decision]
    F --> G[Interpret in Context]

    style A fill:#E1F5FE
    style B fill:#FFF9C4
    style C fill:#EDE7F6
    style D fill:#C8E6C9
    style E fill:#FFF9C4
    style F fill:#E1F5FE
    style G fill:#C8E6C9

Null and Alternative Hypotheses ☆ #

A hypothesis is a claim about a population parameter.

The two main hypotheses are:

HypothesisMeaningExample
Null hypothesisDefault assumption; no change, no difference, no effectThe mean waiting time is 5 minutes
Alternative hypothesisWhat we look for evidence to supportThe mean waiting time is not 5 minutes
The null hypothesis is assumed true until the sample gives enough evidence against it.

Common Notation #

\[ H_0 : \mu = \mu_0 \] \[ H_1 : \mu \neq \mu_0 \]

Types of Tests ☆ #

Test TypeAlternative HypothesisMeaning
Two-tailed test\( \mu \neq \mu_0 \)Checks for any difference
Right-tailed test\( \mu > \mu_0 \)Checks if value is greater
Left-tailed test\( \mu < \mu_0 \)Checks if value is smaller

Do not choose the tail after seeing the data.

The direction of the test must come from the question statement.


Significance Level and p-value ☆ #

The significance level is the allowed probability of rejecting the null hypothesis when it is actually true.

Common values are:

  • 0.10
  • 0.05
  • 0.01

The p-value measures how surprising the sample result is if the null hypothesis is true.

RuleDecision
p-value less than or equal to significance levelReject \( H_0 \)
p-value greater than significance levelFail to reject \( H_0 \)
\[ p\text{-value} \leq \alpha \Rightarrow \text{Reject } H_0 \]

Type I and Type II Errors ☆ #

True SituationDecisionError?
\( H_0 \) is trueReject \( H_0 \)Type I error
\( H_0 \) is falseFail to reject \( H_0 \)Type II error
\[ \alpha = P(\text{Type I error}) \] \[ \beta = P(\text{Type II error}) \] \[ \text{Power} = 1 - \beta \]

Sampling Distribution and Central Limit Theorem ☆ #

A sampling distribution describes the behaviour of a statistic, such as the sample mean, over repeated samples.

The Central Limit Theorem says that for a sufficiently large sample size, the distribution of the sample mean becomes approximately normal, even if the original data is not perfectly normal.

\[ \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]

The standard error of the mean is:

\[ SE = \frac{\sigma}{\sqrt{n}} \]

Confidence Interval ☆ #

A confidence interval gives a plausible range for a population parameter.

For a population mean with known standard deviation:

\[ \bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]

For a population mean with unknown standard deviation, use the t-distribution:

\[ \bar{x} \pm t_{\alpha/2,n-1}\frac{s}{\sqrt{n}} \]

A wider confidence interval means more uncertainty.

A larger sample size normally gives a narrower interval.


One-Sample Mean Test ☆ #

Use this when checking whether a population mean differs from a claimed value.

z-test for Mean #

Use when population standard deviation is known or sample size is large.

\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

t-test for Mean #

Use when population standard deviation is unknown and the sample standard deviation is used.

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

Degrees of freedom:

\[ df = n - 1 \]

Two-Sample Mean Test ☆ #

Use this when comparing two population means.

Independent Samples #

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

Paired Samples #

Use when observations are naturally paired, such as before-and-after measurements.

\[ t = \frac{\bar{d}}{s_d / \sqrt{n}} \]

One-Proportion Test ☆ #

Use this when testing a claim about a population proportion.

\[ z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]

Where:

  • \( \hat{p} \) is the sample proportion
  • \( p_0 \) is the claimed population proportion
  • \( n \) is sample size

Several Proportions and Chi-Square Test ☆ #

When comparing categorical distributions, use a chi-square test.

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Where:

  • \( O \) is observed frequency
  • \( E \) is expected frequency

Expected count in a contingency table:

\[ E = \frac{(\text{row total})(\text{column total})}{\text{grand total}} \]

ANOVA: Analysis of Variance ☆ #

ANOVA compares means across more than two groups.

Instead of doing many pairwise t-tests, ANOVA checks whether at least one group mean differs.

One-Way ANOVA #

Use one categorical factor and one numerical response.

Example:

  • Factor: teaching method
  • Response: exam score

Hypotheses:

\[ H_0 : \mu_1 = \mu_2 = \cdots = \mu_k \] \[ H_1 : \text{At least one mean is different} \]

F Statistic #

\[ F = \frac{MSB}{MSW} \]

Where:

  • \( MSB \) is mean square between groups
  • \( MSW \) is mean square within groups
\[ MSB = \frac{SSB}{k-1} \] \[ MSW = \frac{SSW}{N-k} \]

Two-Way ANOVA #

Two-way ANOVA studies two factors at the same time.

It can check:

  • effect of factor A
  • effect of factor B
  • interaction between A and B

In ANOVA, rejecting the null hypothesis tells you that at least one mean differs.

It does not automatically tell which groups differ.

For that, post-hoc tests are needed.


Maximum Likelihood Estimation ☆ #

Maximum Likelihood Estimation is a method for estimating parameters by choosing the parameter values that make the observed data most likely.

If data points are independent:

\[ L(\theta) = \prod_{i=1}^{n} f(x_i \mid \theta) \]

The log-likelihood is usually easier to optimise:

\[ \ell(\theta) = \log L(\theta) = \sum_{i=1}^{n} \log f(x_i \mid \theta) \]

MLE chooses:

\[ \hat{\theta}_{MLE} = \arg\max_{\theta} \ell(\theta) \]

Example: Bernoulli Parameter #

For Bernoulli data, the MLE of probability of success is the sample proportion.

\[ \hat{p}_{MLE} = \frac{\sum_{i=1}^{n} x_i}{n} \]

Exam Workflow ☆ #

Use this decision table in exams.

Question ClueLikely Test
One sample mean, population standard deviation knownz-test
One sample mean, population standard deviation unknownt-test
Two group meanstwo-sample t-test
Before-and-after datapaired t-test
One population proportionone-proportion z-test
Categorical countschi-square test
More than two group meansANOVA
Estimate parameter from likelihoodMLE

Common Mistakes ☆ #

  • Saying “accept the null hypothesis” instead of “fail to reject the null hypothesis”.
  • Confusing confidence level with significance level.
  • Using z-test when t-test is required.
  • Forgetting degrees of freedom.
  • Treating correlation as causation.
  • Running multiple t-tests instead of ANOVA for more than two means.

Why It Matters in AI and ML #

Hypothesis testing is used in AI and ML to:

  • compare two models statistically
  • check whether a new model improvement is significant
  • evaluate A/B tests
  • assess feature importance
  • validate assumptions in regression
  • analyse experiment results

In ML, a higher score from one model is not always meaningful.

Hypothesis testing helps decide whether the improvement is likely real or just due to random sampling variation.


Quick Revision Checklist ☆ #

  • Can I define \( H_0 \) and \( H_1 \) correctly?
  • Do I know whether the test is one-tailed or two-tailed?
  • Do I know which statistic to use?
  • Did I compute the standard error correctly?
  • Did I compare p-value with \( \alpha \) ?
  • Did I write the final conclusion in the language of the problem?

Home | Statistics