Neural Networks
# A network of artificial neurons inspired by how neurons function in the human brain . At its core - a mathematical model designed to process and learn from data. Neural networks form the foundation of Deep Learning (involves training large and complex networks on vast amounts of data).
flowchart LR
subgraph subGraph0["Input Layer"]
I1(("Input 1"))
I2(("Input 2"))
I3(("Input 3"))
end
subgraph subGraph1["Hidden Layer"]
H1(("Hidden 1"))
H2(("Hidden 2"))
H3(("Hidden 3"))
end
subgraph subGraph2["Output Layer"]
O(("Output"))
end
I1 --> H1 & H2 & H3
I2 --> H1 & H2 & H3
I3 --> H1 & H2 & H3
H1 --> O
H2 --> O
H3 --> O
style I1 fill:#C8E6C9
style I2 fill:#C8E6C9
style I3 fill:#C8E6C9
style H1 stroke:#2962FF,fill:#BBDEFB
style H2 fill:#BBDEFB
style H3 fill:#BBDEFB
style O fill:#FFCDD2
style subGraph0 stroke:none,fill:transparent
style subGraph1 stroke:none,fill:transparent
style subGraph2 stroke:none,fill:transparent
Structure of a Neural Network
# A typical neural network has three main layers :
Artificial Neuron and Perceptron
# knowledge in neural networks is stored in connection weights , and learning means modifying those weights .
Biological Neuron
# A biological neuron is a specialised cell that processes and transmits information through electrical and chemical signals.
Core components:
Dendrites : receive signals from other neuronsCell body (soma) : processes incoming signalsAxon : transmits the output signalSynapses : connection points between neuronsBiological intuition:
many inputs arrive to one neuron one neuron can connect out to many neurons massive parallelism enables fast perception and recognition Artificial Neuron
# An artificial neuron is a simplified computational model inspired by biological neurons.
April 22, 2026
Deep Learning
# Subset of ML focuses on algorithms inspired by the structure and function of the brain called Artificial Neural Networks . A neural network with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network. Allows systems to automatically learn hierarchical representations (features) from raw input, such as images, sound, or text. Operational Steps for Neural Architectures
# Step Perceptron (Boolean/Logic) Linear Regression Network Binary Classification (Logistic) DFNN / MLP (Classification) 1. Input Take binary or discrete inputs
\( x_1, \dots, x_n \) Take numerical features
\( x \) Take numerical features
\( x \) Take high-dimensional numerical or categorical features 2. Weighted Sum Single calculation:
\( z = \sum (w_i x_i) + b \) Single calculation:
\( \hat{y} = w_0 + w_1 x \) Single calculation:
\( z = W x + b \) Multiple stages:
\( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
for each layer
\( l \) 3. Activation Step Function: Output 1 if
\( z \geq 0 \)
, else 0 Identity: The output remains
\( z \)
(no non-linear change) Sigmoid: Maps
\( z \)
to a probability between 0 and 1 ReLU for hidden layers; Softmax/Sigmoid for the output layer 4. Loss / Error Error = Target − Output Mean Squared Error (MSE):
\( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \) Binary Cross-Entropy (BCE): penalises based on probability distance BCE or Categorical Cross-Entropy for multiple classes 5. Optimisation Update weights only on misclassification Gradient Descent: compute gradients at initialization and update weights Backpropagation: compute error signals
\( \delta \)
and gradients
\( dW \) Backpropagation: recursive chain rule to update all hidden layer weights 6. Output Discrete Boolean value (0 or 1) Continuous numerical value (e.g., house prices) Single probability score or class label A vector of probabilities for multiple classes
flowchart LR
%% Input Layer
subgraph subGraph0["Input Layer"]
I1(("Input 1"))
I2(("Input 2"))
I3(("Input 3"))
end
%% Hidden Layers
subgraph subGraph1["Hidden Layer 1"]
H1a(("H1-1"))
H1b(("H1-2"))
H1c(("H1-3"))
end
subgraph subGraph2["Hidden Layer 2"]
H2a(("H2-1"))
H2b(("H2-2"))
H2c(("H2-3"))
end
subgraph subGraph3["Hidden Layer 3"]
H3a(("H3-1"))
H3b(("H3-2"))
H3c(("H3-3"))
end
%% Output Layer
subgraph subGraph4["Output Layer"]
O(("Output"))
end
%% Connections: Input to Hidden Layer 1
I1 --> H1a & H1b & H1c
I2 --> H1a & H1b & H1c
I3 --> H1a & H1b & H1c
%% Connections: Hidden Layer 1 to Hidden Layer 2
H1a --> H2a & H2b & H2c
H1b --> H2a & H2b & H2c
H1c --> H2a & H2b & H2c
%% Connections: Hidden Layer 2 to Hidden Layer 3
H2a --> H3a & H3b & H3c
H2b --> H3a & H3b & H3c
H2c --> H3a & H3b & H3c
%% Connections: Hidden Layer 3 to Output
H3a --> O
H3b --> O
H3c --> O
%% Styling
style I1 fill:#C8E6C9
style I2 fill:#C8E6C9
style I3 fill:#C8E6C9
style H1a fill:#BBDEFB
style H1b fill:#BBDEFB
style H1c fill:#BBDEFB
style H2a fill:#90CAF9
style H2b fill:#90CAF9
style H2c fill:#90CAF9
style H3a fill:#64B5F6
style H3b fill:#64B5F6
style H3c fill:#64B5F6
style O fill:#FFCDD2
style subGraph0 stroke:none,fill:transparent
style subGraph1 stroke:none,fill:transparent
style subGraph2 stroke:none,fill:transparent
style subGraph3 stroke:none,fill:transparent
style subGraph4 stroke:none,fill:transparent
Types of Neural Networks
# Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection) RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation) Hybrid NN (e.g. Autonoumous Driving) Components of DL
# Data Learning Algorithm : How to transform data Loss Function : Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.Optimnisation Algorithm: in order to adjust the loss function , Learning Algorithm will try to optimize our algorithm . searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent . Model Operational Steps for Neural Architectures
# Step Perceptron (Boolean/Logic) Linear Regression Network Binary Classification (Logistic) DFNN / MLP (Classification) 1. Input Binary/discrete inputs
\( x_1, \dots, x_n \) Numerical features
\( x \) Numerical features
\( x \) High-dimensional numerical or categorical features 2. Weighted Sum \( z = \sum (w_i x_i) + b \) \( \hat{y} = w_0 + w_1 x \) \( z = W x + b \) \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \) 3. Activation Step: 1 if
\( z \geq 0 \)
, else 0 Identity: output =
\( z \) Sigmoid: maps
\( z \)
to probability ReLU (hidden), Softmax/Sigmoid (output) 4. Loss / Error Error = Target − Output \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \) Binary Cross-Entropy (BCE) BCE or Categorical Cross-Entropy 5. Optimisation Update on misclassification Gradient Descent Backpropagation (single layer) Backpropagation (multi-layer chain rule) 6. Output Boolean (0 or 1) Continuous value Probability score Probability vector (multi-class)
Applications
# Computer Vision (e.g., face detection, medical imaging) Natural Language Processing (e.g., ChatGPT, translation) Self Driving Cars Speech Assistants (e.g., Alexa, Siri) Intution
# Deep Learning is the methodology, DNN is a model.
Attention Mechanism
# Queries, Keys, and Values Attention Pooling by Similarity Attention Pooling via Nadaraya–Watson Regression Attention Scoring Functions Dot Product Attention Convenience Functions Scaled Dot Product Attention Additive Attention Bahdanau Attention Mechanism Multi-Head Attention Self-Attention Positional Encoding Code implementation (webinar) Reference
# Dive into deep learning. Cambridge University Press. . (Ch 10 , Ch7 Home | Deep Learning
Optimisation of Deep models
# Goal of Optimization Optimization Challenges in Deep Learning Gradient Descent Stochastic Gradient Descent Minibatch Stochastic Gradient Descent Momentum Adagrad and Algorithm RMSProp and Algorithm Adadelta and Algorithm Adam and Algorithm Code Implementation and comparison of algorithms (webinar) Reference
# Dive into deep learning. Cambridge University Press. . (Ch12)Home | Deep Learning
Regularisation for Deep models
# Generalization for regression Training Error and Generalization Error Underfitting or Overfitting Model Selection Weight Decay and Norms Generalization in Classification Environment and Distribution Shift Generalization in Deep Learning Dropout Batch Normalization Layer Normalization Code implementation (webinar) Reference
# Home | Deep Learning