Machine learning Workflow #

Data is the foundation of any machine learning system. Quality of data matters more than model complexity.

Data determines:

Bad data → bad model (even with perfect algorithms).

Raw data is never ready for training.

Common preprocessing steps:

Convert raw data → usable features.

Real-world data is often imbalanced.

Example:

If left untreated, the model becomes biased.

Common techniques:

Goal: Ensure the model learns fairly from all classes.

This is where learning happens.

Training is iterative, not one-time.

Training accuracy alone is misleading.

Models must be evaluated on unseen data.

Common metrics:

This tells us: Does the model generalise beyond training data?