ML

Regression(Linear Models)

Linear Regression #

Linear Regression is a supervised ML method used to predict a numerical target by fitting a model that is linear in its parameters.

In ML , linear models are a core baseline: they’re fast, often surprisingly strong, and usually easy to interpret.

Key takeaway: Linear Regression learns parameters by minimising a squared-error cost. You can solve it directly (closed form) or iteratively (gradient descent), and you can extend it using basis functions and regularisation.

Ordinary Least Squares

Direct solution method - Ordinary Least Squares and the Line of Best Fit #

It is possible to compute the best parameters for linear regression in one shot (closed-form), instead of iteratively improving them step-by-step. fileciteturn34file10turn34file6

For linear regression, the direct method is usually Ordinary Least Squares (OLS).

Ordinary Least Squares (OLS) chooses the “best” line by minimising squared prediction errors.

Key takeaway: OLS defines “best fit” as the line that minimises the total squared residual error across all data points.

Cost Function

Cost Function #

  • also known as an objective function

  • how far the predicted values are from the actual ones

  • measure of the difference between predicted values and actual values

  • quantifies the error between a model’s predicted values and actual values

  • measures the model’s error on a group of datapoints

  • method used to predict values by drawing the best-fit line through the data

  • used to evaluate the accuracy of a model’s predictions

Gradient Descent

Gradient Descent for Linear Regression #

Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.

  • Iterative method
  • Types: batch / stochastic / mini-batch

Key takeaway: Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.

flowchart TD
GD["Gradient<br/>Descent"] -->|minimises| CF["Cost<br/>function"]
GD -->|updates| W["Parameters<br/>(weights)"]
GD -->|uses| GR["Gradient<br/>(slope)"]

GD --> H["Hyperparameters"]
H --> LR["Learning<br/>rate"]
H --> BS["Batch<br/>size"]
H --> EP["Epochs"]

style GD fill:#90CAF9,stroke:#1E88E5,color:#000

style CF fill:#CE93D8,stroke:#8E24AA,color:#000
style W fill:#CE93D8,stroke:#8E24AA,color:#000
style GR fill:#CE93D8,stroke:#8E24AA,color:#000
style H fill:#CE93D8,stroke:#8E24AA,color:#000
style LR fill:#CE93D8,stroke:#8E24AA,color:#000
style BS fill:#CE93D8,stroke:#8E24AA,color:#000
style EP fill:#CE93D8,stroke:#8E24AA,color:#000

Types of GD #

flowchart TD
T["Gradient Descent<br/>types"] --> BGD["Batch<br/>GD"]
T --> SGD["Stochastic<br/>GD"]
T --> MGD["Mini-batch<br/>GD"]

BGD --> ALL["All data<br/>per step"]
BGD --> STB["Smooth<br/>updates"]

SGD --> ONE["1 sample<br/>per step"]
SGD --> FAST["Quick<br/>progress"]
SGD --> NOISE["Noisy<br/>updates"]

MGD --> MB["Small batch<br/>per step"]
MGD --> PRACT["Practical<br/>default"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style BGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style SGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style MGD fill:#C8E6C9,stroke:#2E7D32,color:#000

style ALL fill:#CE93D8,stroke:#8E24AA,color:#000
style STB fill:#CE93D8,stroke:#8E24AA,color:#000
style ONE fill:#CE93D8,stroke:#8E24AA,color:#000
style FAST fill:#CE93D8,stroke:#8E24AA,color:#000
style NOISE fill:#CE93D8,stroke:#8E24AA,color:#000
style MB fill:#CE93D8,stroke:#8E24AA,color:#000
style PRACT fill:#CE93D8,stroke:#8E24AA,color:#000

Batch #

  • Use only if you have huge compute and a lot of time to train

SGD #

  • go-to solution

Classification(Linear Models)

Linear models for Classification #

  • categorises data by finding a linear boundary (hyperplane) that separates classes
  • calculating a weighted sum of input features plus bias
flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]

P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000

style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000

Discriminant Functions #

Decision Theory #

Probabilistic Discriminative Classifiers #


Logistic Regression #

  • Supervised machine learning algorithm
  • Binary classification algorithm
  • requires data to be linearly separable
  • predicts the probability that an input belongs to a specific class
  • uses Sigmoid function to convert inputs into a probability value between 0 and 1

Key takeaway: Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$, then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).

Foundation Models

Foundation Model #

AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.

  • are large deep learning neural networks

  • are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).

  • Contain millions or billions of parameters.

  • designed to perform a broad range of general tasks

  • designed for general-purpose intelligence, not a single task.

  • acts as base models for building specialised AI applications

LLM - Model

LLM – Large Language Model #

Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.

They learn language by analysing massive amounts of text data, discovering patterns in:

  • grammar

  • meaning

  • context

  • relationships between words and sentences

  • Built on Deep Learning

  • Implemented using Neural Networks

  • Based on Transformers

  • Often combined with tools like:

    • Retrieval (RAG)
    • Agents
    • External APIs
    • Memory systems

What makes an LLM special? #

  • Built using deep neural networks
  • Trained on very large datasets (books, articles, code, web text)
  • Can perform many tasks without task-specific training
  • General-purpose language understanding, not single-task models

Foundation: Transformer Architecture #

LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.

AI Agents

AI Agents #

Also referred to as Agentic AI.

AI agents are intelligent systems that can plan, make decisions, and take actions to achieve goals with minimal human intervention.

  • A common use case is task automation

  • for example booking travel based on a user’s request.

  • AI agents typically build on Generative AI and use Large Language Models (LLMs) as the reasoning core.

  • Agents often interact with tools (APIs, databases, calendars) to complete multi-step workflows.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) #

Retrieval-Augmented Generation (RAG) is a system design pattern that improves an LLM’s answers by:

  1. Retrieving relevant information from an external knowledge source, and then
  2. Augmenting the LLM prompt with that retrieved context before generating the final response.

RAG helps an LLM look things up first, then answer using evidence.


Why RAG is Useful #

RAG is commonly used when:

  • Your knowledge is in private documents (PDFs, policies, internal wiki)
  • You need up-to-date information (things not in the model’s training data)
  • You want fewer hallucinations by grounding answers in retrieved sources
  • You want traceability (show “where the answer came from”)

RAG does not change the model weights.
It changes what the model sees at inference time by adding retrieved context.

Mathematical Foundation

Mathematical Foundations for Machine Learning #

Machine Learning is built on mathematical principles that allow models to:

  • represent data
  • learn patterns
  • optimise performance
flowchart LR
    DATA[Data]
    MATH[Math Models]
    OPT[Optimisation]
    MODEL[Trained Model]

    DATA --> MATH
    MATH --> OPT
    OPT --> MODEL

ML requires core mathematical tools to understand how ML algorithms work internally. Algebra deals with relationships between variables and quantities, while Calculus focuses on change and optimization.