<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>DNN on Arshad Siddiqui</title><link>https://arshadhs.github.io/tags/dnn/</link><description>Recent content in DNN on Arshad Siddiqui</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://arshadhs.github.io/tags/dnn/index.xml" rel="self" type="application/rss+xml"/><item><title>Attention Mechanism</title><link>https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/</guid><description>&lt;h1 id="attention-mechanism">
 Attention Mechanism
 
 &lt;a class="anchor" href="#attention-mechanism">#&lt;/a>
 
&lt;/h1>
&lt;p>Attention is a deep learning mechanism that allows a model to focus on the most relevant parts of an input sequence when producing an output.&lt;/p>
&lt;p>Instead of compressing the whole input into one fixed vector, attention computes a weighted combination of useful information.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Attention answers a simple question:&lt;/p>

&lt;blockquote class='book-hint '>
 &lt;p>For the current prediction, which input tokens should the model focus on most?&lt;/p>
&lt;/blockquote>&lt;/blockquote>
&lt;ul>
&lt;li>Queries, Keys, and Values&lt;/li>
&lt;li>Attention Pooling by Similarity&lt;/li>
&lt;li>Attention Pooling via Nadaraya–Watson Regression&lt;/li>
&lt;li>Attention Scoring Functions&lt;/li>
&lt;li>Dot Product Attention&lt;/li>
&lt;li>Convenience Functions&lt;/li>
&lt;li>Scaled Dot Product Attention&lt;/li>
&lt;li>Additive Attention&lt;/li>
&lt;li>Bahdanau Attention Mechanism&lt;/li>
&lt;li>Multi-Head Attention&lt;/li>
&lt;li>Self-Attention&lt;/li>
&lt;li>Positional Encoding&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="why-attention-is-needed-">
 Why Attention Is Needed ☆
 
 &lt;a class="anchor" href="#why-attention-is-needed-">#&lt;/a>
 
&lt;/h2>
&lt;p>Traditional encoder-decoder RNN models compress the full input sequence into one context vector.&lt;/p></description></item><item><title>Transformer</title><link>https://arshadhs.github.io/docs/ai/deep-learning/090-transformer/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/090-transformer/</guid><description>&lt;h1 id="transformer">
 Transformer
 
 &lt;a class="anchor" href="#transformer">#&lt;/a>
 
&lt;/h1>
&lt;p>A transformer is a neural network architecture that uses attention as its main mechanism for processing sequences.&lt;/p>
&lt;p>Unlike RNNs, transformers do not process tokens one by one.&lt;/p>
&lt;p>They process many tokens in parallel and use self-attention to learn relationships between tokens.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>is an architecture of neural networks&lt;/p>
&lt;/li>
&lt;li>
&lt;p>based on the multi-head attention mechanism&lt;/p>
&lt;/li>
&lt;li>
&lt;p>text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table&lt;/p></description></item><item><title>Optimisation of Deep models</title><link>https://arshadhs.github.io/docs/ai/deep-learning/100-optimise-deep-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/100-optimise-deep-models/</guid><description>&lt;h1 id="optimisation-of-deep-models">
 Optimisation of Deep models
 
 &lt;a class="anchor" href="#optimisation-of-deep-models">#&lt;/a>
 
&lt;/h1>
&lt;p>Optimizers are algorithms that update neural network parameters to reduce the loss function.&lt;/p>
&lt;p>Deep networks usually have millions or billions of parameters, so there is usually no closed-form solution.&lt;/p>
&lt;p>Instead, training uses iterative optimisation.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
An optimiser decides how the model moves through the loss landscape towards lower loss.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;ul>
&lt;li>Goal of Optimization&lt;/li>
&lt;li>Optimization Challenges in Deep Learning&lt;/li>
&lt;li>Gradient Descent&lt;/li>
&lt;li>Stochastic Gradient Descent&lt;/li>
&lt;li>Minibatch Stochastic Gradient Descent&lt;/li>
&lt;li>Momentum&lt;/li>
&lt;li>Adagrad and Algorithm&lt;/li>
&lt;li>RMSProp and Algorithm&lt;/li>
&lt;li>Adadelta and Algorithm&lt;/li>
&lt;li>Adam and Algorithm&lt;/li>
&lt;li>Code Implementation and comparison of algorithms (webinar)&lt;/li>
&lt;/ul>
&lt;hr>


&lt;pre class="mermaid">
flowchart TD
 A[&amp;#34;Optimisers in DNN&amp;#34;] --&amp;gt; B[&amp;#34;Gradient Descent Variants&amp;#34;]
 A --&amp;gt; C[&amp;#34;Momentum-based Optimiser&amp;#34;]
 A --&amp;gt; D[&amp;#34;Adaptive Methods&amp;#34;]
 A --&amp;gt; E[&amp;#34;Learning Rate Schedules&amp;#34;]

 D --&amp;gt; D1[&amp;#34;Parameter-specific learning rates&amp;#34;]

 E --&amp;gt; E1[&amp;#34;Learning rate changes during training&amp;#34;]

 style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
 style B fill:#EDE7F6,stroke:#7E57C2
 style C fill:#C8E6C9,stroke:#43A047
 style D fill:#FFF9C4,stroke:#FBC02D
 style E fill:#F8BBD0,stroke:#D81B60
&lt;/pre>

&lt;hr>
&lt;h2 id="goal-of-optimisation-">
 Goal of Optimisation ☆
 
 &lt;a class="anchor" href="#goal-of-optimisation-">#&lt;/a>
 
&lt;/h2>
&lt;p>The goal is to find parameters 
&lt;span>
 \( \theta \)
 &lt;/span>

 that minimise the loss.&lt;/p></description></item><item><title>Gradient Descent and Mini-Batch Gradient Descent</title><link>https://arshadhs.github.io/docs/ai/deep-learning/101-gradient-descent-and-mini-batch-gd/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/101-gradient-descent-and-mini-batch-gd/</guid><description>&lt;h1 id="optimisation-gradient-descent-and-mini-batch-gradient-descent">
 Optimisation: Gradient Descent and Mini-Batch Gradient Descent
 
 &lt;a class="anchor" href="#optimisation-gradient-descent-and-mini-batch-gradient-descent">#&lt;/a>
 
&lt;/h1>
&lt;p>Gradient descent is the core optimisation idea behind neural network training.
It updates the model parameters by moving in the opposite direction of the gradient of the loss.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Gradient descent uses the gradient to decide how to change the parameters.
The learning rate controls how large each update step is.&lt;/p>
&lt;/blockquote>
&lt;hr>


&lt;pre class="mermaid">
flowchart TD
 A[&amp;#34;Gradient Descent Variants&amp;#34;] --&amp;gt; B[&amp;#34;Batch Gradient Descent&amp;#34;]
 A --&amp;gt; C[&amp;#34;Stochastic Gradient Descent&amp;#34;]
 A --&amp;gt; D[&amp;#34;Mini-batch Gradient Descent&amp;#34;]

 B --&amp;gt; B1[&amp;#34;Uses full dataset&amp;#34;]
 B --&amp;gt; B2[&amp;#34;One update per epoch&amp;#34;]
 B --&amp;gt; B3[&amp;#34;Smooth but slow&amp;#34;]

 C --&amp;gt; C1[&amp;#34;Uses one example at a time&amp;#34;]
 C --&amp;gt; C2[&amp;#34;Frequent updates&amp;#34;]
 C --&amp;gt; C3[&amp;#34;Fast but noisy&amp;#34;]

 D --&amp;gt; D1[&amp;#34;Uses small batches&amp;#34;]
 D --&amp;gt; D2[&amp;#34;Efficient on hardware&amp;#34;]
 D --&amp;gt; D3[&amp;#34;Balanced and practical&amp;#34;]

 style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
 style B fill:#EDE7F6,stroke:#7E57C2
 style C fill:#C8E6C9,stroke:#43A047
 style D fill:#FFF9C4,stroke:#FBC02D
&lt;/pre>

&lt;hr>
&lt;h2 id="gradient-descent-rule-">
 Gradient Descent Rule ☆
 
 &lt;a class="anchor" href="#gradient-descent-rule-">#&lt;/a>
 
&lt;/h2>
&lt;p>The gradient tells us the direction in which the loss increases fastest.
To reduce the loss, we move in the opposite direction.&lt;/p></description></item><item><title>Momentum Methods</title><link>https://arshadhs.github.io/docs/ai/deep-learning/102-momentum-methods/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/102-momentum-methods/</guid><description>&lt;h1 id="optimisation-momentum-methods">
 Optimisation: Momentum Methods
 
 &lt;a class="anchor" href="#optimisation-momentum-methods">#&lt;/a>
 
&lt;/h1>
&lt;p>Momentum improves gradient descent by adding a memory of previous update directions.
Instead of using only the current gradient, the optimiser accumulates velocity across iterations.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Momentum helps the optimiser move faster in consistent directions and reduces zigzag movement in directions where gradients oscillate.&lt;/p>
&lt;/blockquote>
&lt;hr>


&lt;pre class="mermaid">
flowchart TD
 A[&amp;#34;Momentum-based Optimiser&amp;#34;] --&amp;gt; B[&amp;#34;SGD with Momentum&amp;#34;]

 B --&amp;gt; B1[&amp;#34;Adds velocity term&amp;#34;]
 B --&amp;gt; B2[&amp;#34;Accumulates past gradients&amp;#34;]
 B --&amp;gt; B3[&amp;#34;Reduces zig-zag movement&amp;#34;]
 B --&amp;gt; B4[&amp;#34;Speeds up movement in useful direction&amp;#34;]
 B --&amp;gt; B5[&amp;#34;Helps through shallow regions&amp;#34;]

 B1 --&amp;gt; C1[&amp;#34;Current update depends on previous update&amp;#34;]
 B2 --&amp;gt; C2[&amp;#34;Builds inertia&amp;#34;]
 B3 --&amp;gt; C3[&amp;#34;Smoother path to minimum&amp;#34;]

 style A fill:#C8E6C9,stroke:#43A047,stroke-width:2px
 style B fill:#E1F5FE,stroke:#4A90E2
 style B1 fill:#EDE7F6,stroke:#7E57C2
 style B2 fill:#FFF9C4,stroke:#FBC02D
 style B3 fill:#F8BBD0,stroke:#D81B60
 style B4 fill:#EDE7F6,stroke:#7E57C2
 style B5 fill:#FFF9C4,stroke:#FBC02D
&lt;/pre>

&lt;hr>
&lt;h2 id="physical-intuition-">
 Physical Intuition ☆
 
 &lt;a class="anchor" href="#physical-intuition-">#&lt;/a>
 
&lt;/h2>
&lt;p>Momentum is often explained using the analogy of a ball rolling down a hill.&lt;/p></description></item><item><title>Regularisation for Deep models</title><link>https://arshadhs.github.io/docs/ai/deep-learning/110-regularisation-deep-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/110-regularisation-deep-models/</guid><description>&lt;h1 id="regularisation-for-deep-models">
 Regularisation for Deep models
 
 &lt;a class="anchor" href="#regularisation-for-deep-models">#&lt;/a>
 
&lt;/h1>
&lt;p>Regularisation means adding constraints or techniques that prevent a model from becoming too complex and memorising the training data.&lt;/p>
&lt;p>The goal is not only low training error.&lt;/p>
&lt;p>The goal is good performance on unseen data.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Regularisation helps the model generalise by controlling complexity, stabilising training, and reducing overfitting.&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>Generalization for regression&lt;/li>
&lt;li>Training Error and Generalization Error&lt;/li>
&lt;li>Underfitting or Overfitting&lt;/li>
&lt;li>Model Selection&lt;/li>
&lt;li>Weight Decay and Norms&lt;/li>
&lt;li>Generalization in Classification&lt;/li>
&lt;li>Environment and Distribution Shift&lt;/li>
&lt;li>Generalization in Deep Learning&lt;/li>
&lt;li>Dropout&lt;/li>
&lt;li>Batch Normalization&lt;/li>
&lt;li>Layer Normalization&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="underfitting-good-fit-and-overfitting-">
 Underfitting, Good Fit, and Overfitting ☆
 
 &lt;a class="anchor" href="#underfitting-good-fit-and-overfitting-">#&lt;/a>
 
&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Case&lt;/th>
 &lt;th>Model behaviour&lt;/th>
 &lt;th>Training error&lt;/th>
 &lt;th>Test error&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Underfitting&lt;/td>
 &lt;td>too simple&lt;/td>
 &lt;td>high&lt;/td>
 &lt;td>high&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Good fit&lt;/td>
 &lt;td>captures useful pattern&lt;/td>
 &lt;td>low&lt;/td>
 &lt;td>low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Overfitting&lt;/td>
 &lt;td>memorises training noise&lt;/td>
 &lt;td>very low&lt;/td>
 &lt;td>high&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>


&lt;pre class="mermaid">flowchart LR
 A[&amp;#34;Model Complexity&amp;#34;] --&amp;gt; B[&amp;#34;Too Simple: Underfitting&amp;#34;]
 A --&amp;gt; C[&amp;#34;Just Right: Good Fit&amp;#34;]
 A --&amp;gt; D[&amp;#34;Too Complex: Overfitting&amp;#34;]

 style A fill:#E1F5FE,stroke:#4A90E2
 style B fill:#FFF9C4,stroke:#FBC02D
 style C fill:#C8E6C9,stroke:#43A047
 style D fill:#EDE7F6,stroke:#7E57C2&lt;/pre>
&lt;hr>
&lt;h2 id="training-error-and-generalisation-error-">
 Training Error and Generalisation Error ☆
 
 &lt;a class="anchor" href="#training-error-and-generalisation-error-">#&lt;/a>
 
&lt;/h2>
&lt;p>Training error measures performance on data used for learning.&lt;/p></description></item><item><title>DNN Formula and Numerical Sheet</title><link>https://arshadhs.github.io/docs/ai/deep-learning/900-dnn-exam-formula-and-numerical-sheet/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/900-dnn-exam-formula-and-numerical-sheet/</guid><description>&lt;h1 id="dnn-formula-and-numerical-sheet">
 DNN Formula and Numerical Sheet
 
 &lt;a class="anchor" href="#dnn-formula-and-numerical-sheet">#&lt;/a>
 
&lt;/h1>
&lt;p>This page consolidates the most useful Deep Neural Networks formulas and numerical patterns for revision.&lt;/p>
&lt;p>It is designed for preparation and should be used together with the topic pages.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Revision strategy:&lt;/strong>&lt;br>
Do not only memorise formulas.&lt;/p>
&lt;p>For each formula, know:&lt;/p>
&lt;ol>
&lt;li>what each symbol means&lt;/li>
&lt;li>when to apply it&lt;/li>
&lt;li>how to substitute values carefully&lt;/li>
&lt;li>what the output shape or answer represents&lt;/li>
&lt;/ol>
&lt;/blockquote>
&lt;hr>
&lt;h1 id="1-artificial-neuron">
 1. Artificial Neuron
 
 &lt;a class="anchor" href="#1-artificial-neuron">#&lt;/a>
 
&lt;/h1>
&lt;h2 id="weighted-sum-">
 Weighted Sum ☆
 
 &lt;a class="anchor" href="#weighted-sum-">#&lt;/a>
 
&lt;/h2>
&lt;span style="color: blue;">
 &lt;span>
 \[ 
z = \sum_{i=1}^{n} w_i x_i + b
 \]
 &lt;/span>
&lt;/span>
&lt;p>Vector form:&lt;/p></description></item></channel></rss>