Hypothesis Testing
#
Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.
It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.
Key takeaway:
Hypothesis testing is not about proving something with certainty.
It is about asking:
If the null hypothesis were true, how surprising would this sample result be?
Prediction & Forecasting
#
Prediction and forecasting use statistical models to estimate unknown or future values.
In this module, the focus is on correlation, regression, and time series forecasting.
Key takeaway:
Prediction estimates a value using a model.
Forecasting is prediction where the order of time matters.
- Correlation
- Regression
- Time series analysis
- Components of time series data
- Moving average and weighted moving average
- AR model
- ARMA model
- ARIMA model
- SARIMA and SARIMAX
- VAR and VARMAX
- Simple exponential smoothing
Prediction vs Forecasting ☆
#
| Concept | Meaning | Example |
|---|
| Prediction | Estimate an unknown output | Predict house price from area and rooms |
| Forecasting | Predict future values using time order | Forecast sales for next month |
All forecasting is prediction, but not all prediction is forecasting.
Overall Workflow
#
flowchart LR
A[Data] --> B[Explore Pattern]
B --> C[Choose Model]
C --> D[Train or Fit]
D --> E[Validate]
E --> F[Predict or Forecast]
F --> G[Interpret Error]
style A fill:#E1F5FE
style B fill:#C8E6C9
style C fill:#FFF9C4
style D fill:#EDE7F6
style E fill:#C8E6C9
style F fill:#E1F5FE
style G fill:#FFF9C4
Correlation ☆
#
Correlation measures the direction and strength of linear relationship between two variables.
Gaussian Mixture Model & Expectation Maximization
#
A Gaussian Mixture Model represents data as a weighted combination of multiple Gaussian distributions.
It is commonly used for soft clustering and density estimation.
Key takeaway:
K-means gives hard cluster membership.
GMM gives probabilities of belonging to each cluster.
- Gaussian Mixture Model
- soft clustering
- mixing coefficients
- latent variables
- likelihood and log-likelihood
- Expectation-Maximization algorithm
- E-step and M-step
- responsibilities
- convergence
Motivation ☆
#
Many real datasets are not described well by one Gaussian distribution.