Hypothesis Testing #
Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.
It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.
Key takeaway:
Hypothesis testing is not about proving something with certainty.It is about asking:
If the null hypothesis were true, how surprising would this sample result be?
Where This Fits in the Course ☆ #
This topic belongs to Module 4: Hypothesis Testing.
It covers:
- Sampling: random and stratified sampling
- Sampling distribution
- Central Limit Theorem
- Interval estimation and confidence level
- Testing of hypothesis
- Mean-based tests
- Proportion-based tests
- ANOVA: single factor and dual factor
- Maximum likelihood
Big Picture #
Hypothesis testing follows a repeatable workflow.
flowchart LR
A[Research Question] --> B[Set H0 and H1]
B --> C[Choose Test]
C --> D[Compute Test Statistic]
D --> E[Find p-value or Critical Value]
E --> F[Decision]
F --> G[Interpret in Context]
style A fill:#E1F5FE
style B fill:#FFF9C4
style C fill:#EDE7F6
style D fill:#C8E6C9
style E fill:#FFF9C4
style F fill:#E1F5FE
style G fill:#C8E6C9
Null and Alternative Hypotheses ☆ #
A hypothesis is a claim about a population parameter.
The two main hypotheses are:
| Hypothesis | Meaning | Example |
|---|---|---|
| Null hypothesis | Default assumption; no change, no difference, no effect | The mean waiting time is 5 minutes |
| Alternative hypothesis | What we look for evidence to support | The mean waiting time is not 5 minutes |
Common Notation #
\[ H_0 : \mu = \mu_0 \] \[ H_1 : \mu \neq \mu_0 \]Types of Tests ☆ #
| Test Type | Alternative Hypothesis | Meaning |
|---|---|---|
| Two-tailed test | \( \mu \neq \mu_0 \) | Checks for any difference |
| Right-tailed test | \( \mu > \mu_0 \) | Checks if value is greater |
| Left-tailed test | \( \mu < \mu_0 \) | Checks if value is smaller |
Do not choose the tail after seeing the data.
The direction of the test must come from the question statement.
Significance Level and p-value ☆ #
The significance level is the allowed probability of rejecting the null hypothesis when it is actually true.
Common values are:
- 0.10
- 0.05
- 0.01
The p-value measures how surprising the sample result is if the null hypothesis is true.
| Rule | Decision |
|---|---|
| p-value less than or equal to significance level | Reject \( H_0 \) |
| p-value greater than significance level | Fail to reject \( H_0 \) |
Type I and Type II Errors ☆ #
| True Situation | Decision | Error? |
|---|---|---|
| \( H_0 \) is true | Reject \( H_0 \) | Type I error |
| \( H_0 \) is false | Fail to reject \( H_0 \) | Type II error |
Sampling Distribution and Central Limit Theorem ☆ #
A sampling distribution describes the behaviour of a statistic, such as the sample mean, over repeated samples.
The Central Limit Theorem says that for a sufficiently large sample size, the distribution of the sample mean becomes approximately normal, even if the original data is not perfectly normal.
\[ \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]The standard error of the mean is:
\[ SE = \frac{\sigma}{\sqrt{n}} \]Confidence Interval ☆ #
A confidence interval gives a plausible range for a population parameter.
For a population mean with known standard deviation:
\[ \bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]For a population mean with unknown standard deviation, use the t-distribution:
\[ \bar{x} \pm t_{\alpha/2,n-1}\frac{s}{\sqrt{n}} \]A wider confidence interval means more uncertainty.
A larger sample size normally gives a narrower interval.
One-Sample Mean Test ☆ #
Use this when checking whether a population mean differs from a claimed value.
z-test for Mean #
Use when population standard deviation is known or sample size is large.
\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]t-test for Mean #
Use when population standard deviation is unknown and the sample standard deviation is used.
\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]Degrees of freedom:
\[ df = n - 1 \]Two-Sample Mean Test ☆ #
Use this when comparing two population means.
Independent Samples #
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]Paired Samples #
Use when observations are naturally paired, such as before-and-after measurements.
\[ t = \frac{\bar{d}}{s_d / \sqrt{n}} \]One-Proportion Test ☆ #
Use this when testing a claim about a population proportion.
\[ z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]Where:
- \( \hat{p} \) is the sample proportion
- \( p_0 \) is the claimed population proportion
- \( n \) is sample size
Several Proportions and Chi-Square Test ☆ #
When comparing categorical distributions, use a chi-square test.
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]Where:
- \( O \) is observed frequency
- \( E \) is expected frequency
Expected count in a contingency table:
\[ E = \frac{(\text{row total})(\text{column total})}{\text{grand total}} \]ANOVA: Analysis of Variance ☆ #
ANOVA compares means across more than two groups.
Instead of doing many pairwise t-tests, ANOVA checks whether at least one group mean differs.
One-Way ANOVA #
Use one categorical factor and one numerical response.
Example:
- Factor: teaching method
- Response: exam score
Hypotheses:
\[ H_0 : \mu_1 = \mu_2 = \cdots = \mu_k \] \[ H_1 : \text{At least one mean is different} \]F Statistic #
\[ F = \frac{MSB}{MSW} \]Where:
- \( MSB \) is mean square between groups
- \( MSW \) is mean square within groups
Two-Way ANOVA #
Two-way ANOVA studies two factors at the same time.
It can check:
- effect of factor A
- effect of factor B
- interaction between A and B
In ANOVA, rejecting the null hypothesis tells you that at least one mean differs.
It does not automatically tell which groups differ.
For that, post-hoc tests are needed.
Maximum Likelihood Estimation ☆ #
Maximum Likelihood Estimation is a method for estimating parameters by choosing the parameter values that make the observed data most likely.
If data points are independent:
\[ L(\theta) = \prod_{i=1}^{n} f(x_i \mid \theta) \]The log-likelihood is usually easier to optimise:
\[ \ell(\theta) = \log L(\theta) = \sum_{i=1}^{n} \log f(x_i \mid \theta) \]MLE chooses:
\[ \hat{\theta}_{MLE} = \arg\max_{\theta} \ell(\theta) \]Example: Bernoulli Parameter #
For Bernoulli data, the MLE of probability of success is the sample proportion.
\[ \hat{p}_{MLE} = \frac{\sum_{i=1}^{n} x_i}{n} \]Exam Workflow ☆ #
Use this decision table in exams.
| Question Clue | Likely Test |
|---|---|
| One sample mean, population standard deviation known | z-test |
| One sample mean, population standard deviation unknown | t-test |
| Two group means | two-sample t-test |
| Before-and-after data | paired t-test |
| One population proportion | one-proportion z-test |
| Categorical counts | chi-square test |
| More than two group means | ANOVA |
| Estimate parameter from likelihood | MLE |
Common Mistakes ☆ #
- Saying “accept the null hypothesis” instead of “fail to reject the null hypothesis”.
- Confusing confidence level with significance level.
- Using z-test when t-test is required.
- Forgetting degrees of freedom.
- Treating correlation as causation.
- Running multiple t-tests instead of ANOVA for more than two means.
Why It Matters in AI and ML #
Hypothesis testing is used in AI and ML to:
- compare two models statistically
- check whether a new model improvement is significant
- evaluate A/B tests
- assess feature importance
- validate assumptions in regression
- analyse experiment results
In ML, a higher score from one model is not always meaningful.
Hypothesis testing helps decide whether the improvement is likely real or just due to random sampling variation.
Quick Revision Checklist ☆ #
- Can I define \( H_0 \) and \( H_1 \) correctly?
- Do I know whether the test is one-tailed or two-tailed?
- Do I know which statistic to use?
- Did I compute the standard error correctly?
- Did I compare p-value with \( \alpha \) ?
- Did I write the final conclusion in the language of the problem?