Statistical Methods #

Statistical methods help you turn raw data into reliable conclusions, while understanding uncertainty, variability, and confidence.

Statistics provides the language and tools for reasoning about data, uncertainty, and inference.

ML needs understanding data behaviour, drawing conclusions, and validating machine learning models.

Collect Data
Present & Organise Data (in a systematic manner)
Alalyse Data
Infer about the Data
Take Decision from the Data

Data - Types #

flowchart TD
	A[(Data)] --> B["Categorical (Qualitative)"]
    A --> C["Numerical (Quantitative)"]

    B --> B1[Nominal]
    B --> B2[Ordinal]

    C --> C1[Discrete]
    C --> C2[Continuous]

    C2 --> C21[Interval]
    C2 --> C22[Ratio]

    %% Styling
    style A fill:#E1F5FE,stroke:#333
    style B fill:#90CAF9,stroke:#333
    style B1 fill:#90CAF9,stroke:#333
    style B2 fill:#90CAF9,stroke:#333
    style C fill:#FFF9C4,stroke:#333
    style C1 fill:#FFF9C4,stroke:#333
    style C2 fill:#FFF9C4,stroke:#333
    style C21 fill:#FFF9C4,stroke:#333
    style C22 fill:#FFF9C4,stroke:#333

Categorical (Qualitative) #
express a qualitative attribute e.g. hair color, eye color
- Nominal: no natural ordering is possible e.g. hair color, eye color
- Ordinal: a meaningful order is possible e.g. health, which can take values such as poor, reasonable, good, or excellent
Numerical (Quantitative) #
measured using numbers e.g. height, weight, number of people.
- Discrete: countable and typically whole numbers e.g. number of people
- Continuous: measured, not counted, and can take infinitely many values in a range e.g. height
  - INTERVAL: ratio of values of variable do not have any meaning and it does not have an inherently defined zero value e.g. temperature
  - RATIO: ratio of values of variable have meaning and it have an inherently defined zero value such as weight

If the data is a label → categorical.
If the data is a count → discrete.
If the data is a measurement → continuous.
If “twice as much” makes sense → ratio scale.

Role of Statistics in AI and ML #

Helps understand data distributions
Enables model evaluation
Supports decision-making under uncertainty
Forms the basis of many ML algorithms

Core #

1. Descriptive Statistics #

Summarises and describes data.

Mean, median, mode
Variance and standard deviation
Histograms and distributions

2. Probability Distributions #

Models how data behaves.

Normal (Gaussian) distribution
Binomial and Poisson distributions
Continuous vs discrete variables

3. Statistical Inference #

Draws conclusions from data samples.

Sampling techniques
Confidence intervals
Hypothesis testing
p-values

4. Regression and Correlation #

Analyses relationships between variables.

Linear regression
Correlation coefficients
Causation vs correlation

5. Model Evaluation #

Assesses performance and reliability.

Bias and variance
Overfitting and underfitting
Error estimation

Conceptual View #

flowchart LR
    DATA[Observed Data]
    SAMPLE[Sampling]
    ANALYSIS[Statistical Analysis]
    INFER[Inference & Decisions]

    DATA --> SAMPLE
    SAMPLE --> ANALYSIS
    ANALYSIS --> INFER

Outcome #

Analyse datasets statistically
Make data-driven inferences
Evaluate ML models rigorously
Understand uncertainty and variability in predictions

Refrences #

Probability & Statistics

Home | Artificial Intelligence