Generative Artificial Intelligence (GenAI) refers to a class of AI systems that can generate new content such as text, images, audio, video, or code, rather than only making predictions or classifications.
GenAI systems learn patterns and representations from large datasets and use them to produce novel outputs that resemble the data they were trained on.
The AI pipeline is a continuous process where data is collected, prepared, used to train models, evaluated for performance, and continuously improved after deployment.
timeline
title AI Pipeline
Collect Data : Data Ingestion
: Data Understanding
Prepare Data : Cleaning
: Feature Engineering
: Sampling
Train Model : Model Training
: Validation & Metrics
Deploy Model : Deployment
: Monitoring & Retraining
Linear Regression is a supervised
ML
method used to predict a numerical target by fitting a model that is linear in its parameters.
In
ML
, linear models are a core baseline:
they’re fast, often surprisingly strong, and usually easy to interpret.
Key takeaway:
Linear Regression learns parameters by minimising a squared-error cost.
You can solve it directly (closed form) or iteratively (gradient descent),
and you can extend it using basis functions and regularisation.
A random variable is a way to attach numbers to outcomes of a random experiment.
It lets us move from:
“what happened?”
to:
“what number should we analyse?”
Key takeaway:
A random variable is a function from the sample space to real numbers.
Once you define the random variable clearly, the rest (pmf/pdf/cdf, mean, variance) becomes systematic.
flowchart TD
PD["Probability<br/>distributions"] --> RV["Random<br/>variables"]
RV --> T["Types"]
T --> RV1["Discrete<br/>RVs"]
T --> RV2["Continuous<br/>RVs"]
RV --> F["PMF / PDF / CDF"]
RV --> S["Mean / Variance<br/>Covariance"]
RV --> J["Joint & Marginal<br/>distributions"]
RV --> X["Transformations"]
style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style RV fill:#90CAF9,stroke:#1E88E5,color:#000
style T fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style S fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style X fill:#CE93D8,stroke:#8E24AA,color:#000
style RV1 fill:#CE93D8,stroke:#8E24AA,color:#000
style RV2 fill:#CE93D8,stroke:#8E24AA,color:#000
Once you can describe a random variable using a pmf or pdf, the next step is to use
named distributions that appear repeatedly in real data and in ML models.
Key takeaway:
Named distributions give you ready-made probability models for common patterns:
binary outcomes, counts, and measurement noise.
Direct solution method - Ordinary Least Squares and the Line of Best Fit
#
Revision: OLS is the direct method for linear regression. It finds the best-fit line by minimising the sum of squared residuals without iterative updates.
Revision: A cost function converts model error into a single number. Training means changing the model parameters until this number becomes as small as possible.
A machine learning model needs a way to decide whether one set of parameters is better than another.
For linear regression, every possible value of the parameters gives a different line.
The cost function tells us which line is better by measuring how far the predictions are from the true values.
Gradient descent is used when we want the model to learn parameters by repeatedly improving them.
For linear regression, it adjusts the slope and intercept until the prediction error becomes small.
flowchart LR
A["Initial Parameters"] --> B["Make Predictions"]
B --> C["Compute Cost"]
C --> D["Compute Gradient"]
D --> E["Update Parameters"]
E --> B
style A fill:#E1F5FE,stroke:#5b7db1,color:#000
style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
style C fill:#FFF9C4,stroke:#b59b3b,color:#000
style D fill:#EDE7F6,stroke:#8a6fb3,color:#000
style E fill:#C8E6C9,stroke:#5f8f6a,color:#000
Loss Function: Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.
Optimnisation Algorithm: in order to adjust the loss function, Learning Algorithm will try to optimize our algorithm. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent.
predicts the probability that an input belongs to a specific class
uses Sigmoid function to convert inputs into a probability value between 0 and 1
Key takeaway:
Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$,
then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).