April 22, 2026
Deep Learning
# Subset of ML focuses on algorithms inspired by the structure and function of the brain called Artificial Neural Networks . A neural network with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network. Allows systems to automatically learn hierarchical representations (features) from raw input, such as images, sound, or text. Operational Steps for Neural Architectures
# Step Perceptron (Boolean/Logic) Linear Regression Network Binary Classification (Logistic) DFNN / MLP (Classification) 1. Input Take binary or discrete inputs
\( x_1, \dots, x_n \) Take numerical features
\( x \) Take numerical features
\( x \) Take high-dimensional numerical or categorical features 2. Weighted Sum Single calculation:
\( z = \sum (w_i x_i) + b \) Single calculation:
\( \hat{y} = w_0 + w_1 x \) Single calculation:
\( z = W x + b \) Multiple stages:
\( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
for each layer
\( l \) 3. Activation Step Function: Output 1 if
\( z \geq 0 \)
, else 0 Identity: The output remains
\( z \)
(no non-linear change) Sigmoid: Maps
\( z \)
to a probability between 0 and 1 ReLU for hidden layers; Softmax/Sigmoid for the output layer 4. Loss / Error Error = Target − Output Mean Squared Error (MSE):
\( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \) Binary Cross-Entropy (BCE): penalises based on probability distance BCE or Categorical Cross-Entropy for multiple classes 5. Optimisation Update weights only on misclassification Gradient Descent: compute gradients at initialization and update weights Backpropagation: compute error signals
\( \delta \)
and gradients
\( dW \) Backpropagation: recursive chain rule to update all hidden layer weights 6. Output Discrete Boolean value (0 or 1) Continuous numerical value (e.g., house prices) Single probability score or class label A vector of probabilities for multiple classes
flowchart LR
%% Input Layer
subgraph subGraph0["Input Layer"]
I1(("Input 1"))
I2(("Input 2"))
I3(("Input 3"))
end
%% Hidden Layers
subgraph subGraph1["Hidden Layer 1"]
H1a(("H1-1"))
H1b(("H1-2"))
H1c(("H1-3"))
end
subgraph subGraph2["Hidden Layer 2"]
H2a(("H2-1"))
H2b(("H2-2"))
H2c(("H2-3"))
end
subgraph subGraph3["Hidden Layer 3"]
H3a(("H3-1"))
H3b(("H3-2"))
H3c(("H3-3"))
end
%% Output Layer
subgraph subGraph4["Output Layer"]
O(("Output"))
end
%% Connections: Input to Hidden Layer 1
I1 --> H1a & H1b & H1c
I2 --> H1a & H1b & H1c
I3 --> H1a & H1b & H1c
%% Connections: Hidden Layer 1 to Hidden Layer 2
H1a --> H2a & H2b & H2c
H1b --> H2a & H2b & H2c
H1c --> H2a & H2b & H2c
%% Connections: Hidden Layer 2 to Hidden Layer 3
H2a --> H3a & H3b & H3c
H2b --> H3a & H3b & H3c
H2c --> H3a & H3b & H3c
%% Connections: Hidden Layer 3 to Output
H3a --> O
H3b --> O
H3c --> O
%% Styling
style I1 fill:#C8E6C9
style I2 fill:#C8E6C9
style I3 fill:#C8E6C9
style H1a fill:#BBDEFB
style H1b fill:#BBDEFB
style H1c fill:#BBDEFB
style H2a fill:#90CAF9
style H2b fill:#90CAF9
style H2c fill:#90CAF9
style H3a fill:#64B5F6
style H3b fill:#64B5F6
style H3c fill:#64B5F6
style O fill:#FFCDD2
style subGraph0 stroke:none,fill:transparent
style subGraph1 stroke:none,fill:transparent
style subGraph2 stroke:none,fill:transparent
style subGraph3 stroke:none,fill:transparent
style subGraph4 stroke:none,fill:transparent
Types of Neural Networks
# Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection) RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation) Hybrid NN (e.g. Autonoumous Driving) Components of DL
# Data Learning Algorithm : How to transform data Loss Function : Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.Optimnisation Algorithm: in order to adjust the loss function , Learning Algorithm will try to optimize our algorithm . searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent . Model Operational Steps for Neural Architectures
# Step Perceptron (Boolean/Logic) Linear Regression Network Binary Classification (Logistic) DFNN / MLP (Classification) 1. Input Binary/discrete inputs
\( x_1, \dots, x_n \) Numerical features
\( x \) Numerical features
\( x \) High-dimensional numerical or categorical features 2. Weighted Sum \( z = \sum (w_i x_i) + b \) \( \hat{y} = w_0 + w_1 x \) \( z = W x + b \) \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \) 3. Activation Step: 1 if
\( z \geq 0 \)
, else 0 Identity: output =
\( z \) Sigmoid: maps
\( z \)
to probability ReLU (hidden), Softmax/Sigmoid (output) 4. Loss / Error Error = Target − Output \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \) Binary Cross-Entropy (BCE) BCE or Categorical Cross-Entropy 5. Optimisation Update on misclassification Gradient Descent Backpropagation (single layer) Backpropagation (multi-layer chain rule) 6. Output Boolean (0 or 1) Continuous value Probability score Probability vector (multi-class)
Applications
# Computer Vision (e.g., face detection, medical imaging) Natural Language Processing (e.g., ChatGPT, translation) Self Driving Cars Speech Assistants (e.g., Alexa, Siri) Intution
# Deep Learning is the methodology, DNN is a model.