<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Neural Networks on Arshad Siddiqui</title><link>https://arshadhs.github.io/tags/neural-networks/</link><description>Recent content in Neural Networks on Arshad Siddiqui</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 22 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://arshadhs.github.io/tags/neural-networks/index.xml" rel="self" type="application/rss+xml"/><item><title>Neural Networks</title><link>https://arshadhs.github.io/docs/ai/deep-learning/010-neural-network/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/010-neural-network/</guid><description>&lt;h1 id="neural-networks">
 Neural Networks
 
 &lt;a class="anchor" href="#neural-networks">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>A &lt;strong>network of artificial neurons&lt;/strong> inspired by how neurons function in the &lt;strong>human brain&lt;/strong>.&lt;/li>
&lt;li>At its core - a &lt;strong>mathematical model&lt;/strong> designed to process and learn from data.&lt;/li>
&lt;li>Neural networks form the &lt;strong>foundation of &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/">Deep Learning&lt;/a>&lt;/strong> (involves training large and complex networks on vast amounts of data).&lt;/li>
&lt;/ul>
&lt;hr>


&lt;script src="https://arshadhs.github.io/mermaid.min.js">&lt;/script>

 &lt;script>mermaid.initialize({
 "flowchart": {
 "useMaxWidth":true
 },
 "theme": "default"
}
)&lt;/script>




&lt;pre class="mermaid">
flowchart LR
 subgraph subGraph0[&amp;#34;Input Layer&amp;#34;]
 I1((&amp;#34;Input 1&amp;#34;))
 I2((&amp;#34;Input 2&amp;#34;))
 I3((&amp;#34;Input 3&amp;#34;))
 end
 subgraph subGraph1[&amp;#34;Hidden Layer&amp;#34;]
 H1((&amp;#34;Hidden 1&amp;#34;))
 H2((&amp;#34;Hidden 2&amp;#34;))
 H3((&amp;#34;Hidden 3&amp;#34;))
 end
 subgraph subGraph2[&amp;#34;Output Layer&amp;#34;]
 O((&amp;#34;Output&amp;#34;))
 end
 I1 --&amp;gt; H1 &amp;amp; H2 &amp;amp; H3
 I2 --&amp;gt; H1 &amp;amp; H2 &amp;amp; H3
 I3 --&amp;gt; H1 &amp;amp; H2 &amp;amp; H3
 H1 --&amp;gt; O
 H2 --&amp;gt; O
 H3 --&amp;gt; O

 style I1 fill:#C8E6C9
 style I2 fill:#C8E6C9
 style I3 fill:#C8E6C9
 style H1 stroke:#2962FF,fill:#BBDEFB
 style H2 fill:#BBDEFB
 style H3 fill:#BBDEFB
 style O fill:#FFCDD2
 style subGraph0 stroke:none,fill:transparent
 style subGraph1 stroke:none,fill:transparent
 style subGraph2 stroke:none,fill:transparent
&lt;/pre>

&lt;hr>
&lt;h3 id="structure-of-a-neural-network">
 Structure of a Neural Network
 
 &lt;a class="anchor" href="#structure-of-a-neural-network">#&lt;/a>
 
&lt;/h3>
&lt;p>A typical neural network has &lt;strong>three main layers&lt;/strong>:&lt;/p></description></item><item><title>Artificial Neuron and Perceptron</title><link>https://arshadhs.github.io/docs/ai/deep-learning/020-perceptron/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/020-perceptron/</guid><description>&lt;h1 id="artificial-neuron-and-perceptron">
 Artificial Neuron and Perceptron
 
 &lt;a class="anchor" href="#artificial-neuron-and-perceptron">#&lt;/a>
 
&lt;/h1>
&lt;blockquote class="book-hint info">
&lt;p>knowledge in neural networks is stored in &lt;strong>connection weights&lt;/strong>, and learning means &lt;strong>modifying those weights&lt;/strong>.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="biological-neuron">
 Biological Neuron
 
 &lt;a class="anchor" href="#biological-neuron">#&lt;/a>
 
&lt;/h2>
&lt;p>A biological neuron is a specialised cell that processes and transmits information through electrical and chemical signals.&lt;/p>
&lt;p>Core components:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Dendrites&lt;/strong>: receive signals from other neurons&lt;/li>
&lt;li>&lt;strong>Cell body (soma)&lt;/strong>: processes incoming signals&lt;/li>
&lt;li>&lt;strong>Axon&lt;/strong>: transmits the output signal&lt;/li>
&lt;li>&lt;strong>Synapses&lt;/strong>: connection points between neurons&lt;/li>
&lt;/ul>
&lt;p>Biological intuition:&lt;/p>
&lt;ul>
&lt;li>many inputs arrive to one neuron&lt;/li>
&lt;li>one neuron can connect out to many neurons&lt;/li>
&lt;li>massive parallelism enables fast perception and recognition&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="artificial-neuron">
 Artificial Neuron
 
 &lt;a class="anchor" href="#artificial-neuron">#&lt;/a>
 
&lt;/h2>
&lt;p>An artificial neuron is a simplified computational model inspired by biological neurons.&lt;/p></description></item><item><title>Deep Learning</title><link>https://arshadhs.github.io/docs/ai/deep-learning/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/</guid><description>&lt;h1 id="deep-learning">
 Deep Learning
 
 &lt;a class="anchor" href="#deep-learning">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Subset of ML&lt;/li>
&lt;li>focuses on algorithms inspired by the structure and function of the brain called &lt;strong>Artificial Neural Networks&lt;/strong>.&lt;/li>
&lt;li>A &lt;a href="https://arshadhs.github.io/docs/ai/neural-network/">neural network&lt;/a> with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network.&lt;/li>
&lt;li>Allows systems to &lt;strong>automatically learn hierarchical representations&lt;/strong> (features) from raw input, such as images, sound, or text.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="operational-steps-for-neural-architectures">
 Operational Steps for Neural Architectures
 
 &lt;a class="anchor" href="#operational-steps-for-neural-architectures">#&lt;/a>
 
&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Step&lt;/th>
 &lt;th>Perceptron (Boolean/Logic)&lt;/th>
 &lt;th>Linear Regression Network&lt;/th>
 &lt;th>Binary Classification (Logistic)&lt;/th>
 &lt;th>DFNN / MLP (Classification)&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>1. Input&lt;/strong>&lt;/td>
 &lt;td>Take binary or discrete inputs 
&lt;span>
 \( x_1, \dots, x_n \)
 &lt;/span>

&lt;/td>
 &lt;td>Take numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>Take numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>Take high-dimensional numerical or categorical features&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>2. Weighted Sum&lt;/strong>&lt;/td>
 &lt;td>Single calculation: 
&lt;span>
 \( z = \sum (w_i x_i) + b \)
 &lt;/span>

&lt;/td>
 &lt;td>Single calculation: 
&lt;span>
 \( \hat{y} = w_0 + w_1 x \)
 &lt;/span>

&lt;/td>
 &lt;td>Single calculation: 
&lt;span>
 \( z = W x + b \)
 &lt;/span>

&lt;/td>
 &lt;td>Multiple stages: 
&lt;span>
 \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
 &lt;/span>

 for each layer 
&lt;span>
 \( l \)
 &lt;/span>

&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>3. Activation&lt;/strong>&lt;/td>
 &lt;td>Step Function: Output 1 if 
&lt;span>
 \( z \geq 0 \)
 &lt;/span>

, else 0&lt;/td>
 &lt;td>Identity: The output remains 
&lt;span>
 \( z \)
 &lt;/span>

 (no non-linear change)&lt;/td>
 &lt;td>Sigmoid: Maps 
&lt;span>
 \( z \)
 &lt;/span>

 to a probability between 0 and 1&lt;/td>
 &lt;td>ReLU for hidden layers; Softmax/Sigmoid for the output layer&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>4. Loss / Error&lt;/strong>&lt;/td>
 &lt;td>Error = Target − Output&lt;/td>
 &lt;td>Mean Squared Error (MSE): 
&lt;span>
 \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)
 &lt;/span>

&lt;/td>
 &lt;td>Binary Cross-Entropy (BCE): penalises based on probability distance&lt;/td>
 &lt;td>BCE or Categorical Cross-Entropy for multiple classes&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>5. Optimisation&lt;/strong>&lt;/td>
 &lt;td>Update weights only on misclassification&lt;/td>
 &lt;td>Gradient Descent: compute gradients at initialization and update weights&lt;/td>
 &lt;td>Backpropagation: compute error signals 
&lt;span>
 \( \delta \)
 &lt;/span>

 and gradients 
&lt;span>
 \( dW \)
 &lt;/span>

&lt;/td>
 &lt;td>Backpropagation: recursive chain rule to update all hidden layer weights&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>6. Output&lt;/strong>&lt;/td>
 &lt;td>Discrete Boolean value (0 or 1)&lt;/td>
 &lt;td>Continuous numerical value (e.g., house prices)&lt;/td>
 &lt;td>Single probability score or class label&lt;/td>
 &lt;td>A vector of probabilities for multiple classes&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;hr>




&lt;ul>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/010-neural-network/">Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/020-perceptron/">Artificial Neuron and Perceptron&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/030-linear-neural-networks-for-regression/">LNN for Regression&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/035-gradient-descent-algorithm/">Gradient Descent Algorithm&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/040-linear-neural-networks-for-classification/">LNN for Classification&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/050-deep-feedforward/">Deep Feedforward Neural Networks (DFNN) for Classification&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/060-cnn-fundamentals/">Convolutional Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/065-deep-cnn-architectures/">Deep CNN Architectures&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/067-cnn-model/">CNN Pipeline&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/070-recurrent-nn/">Recurrent Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/075-recurrent-nn-deep/">Deep Recurrent Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/">Attention Mechanism&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/090-transformer/">Transformer&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/100-optimise-deep-models/">Optimisation of Deep models&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/110-regularisation-deep-models/">Regularisation for Deep models&lt;/a>
 &lt;/li>
 
 

 
 
&lt;/ul>


&lt;hr>


&lt;pre class="mermaid">
flowchart LR
 %% Input Layer
 subgraph subGraph0[&amp;#34;Input Layer&amp;#34;]
 I1((&amp;#34;Input 1&amp;#34;))
 I2((&amp;#34;Input 2&amp;#34;))
 I3((&amp;#34;Input 3&amp;#34;))
 end

 %% Hidden Layers
 subgraph subGraph1[&amp;#34;Hidden Layer 1&amp;#34;]
 H1a((&amp;#34;H1-1&amp;#34;))
 H1b((&amp;#34;H1-2&amp;#34;))
 H1c((&amp;#34;H1-3&amp;#34;))
 end

 subgraph subGraph2[&amp;#34;Hidden Layer 2&amp;#34;]
 H2a((&amp;#34;H2-1&amp;#34;))
 H2b((&amp;#34;H2-2&amp;#34;))
 H2c((&amp;#34;H2-3&amp;#34;))
 end

 subgraph subGraph3[&amp;#34;Hidden Layer 3&amp;#34;]
 H3a((&amp;#34;H3-1&amp;#34;))
 H3b((&amp;#34;H3-2&amp;#34;))
 H3c((&amp;#34;H3-3&amp;#34;))
 end

 %% Output Layer
 subgraph subGraph4[&amp;#34;Output Layer&amp;#34;]
 O((&amp;#34;Output&amp;#34;))
 end

 %% Connections: Input to Hidden Layer 1
 I1 --&amp;gt; H1a &amp;amp; H1b &amp;amp; H1c
 I2 --&amp;gt; H1a &amp;amp; H1b &amp;amp; H1c
 I3 --&amp;gt; H1a &amp;amp; H1b &amp;amp; H1c

 %% Connections: Hidden Layer 1 to Hidden Layer 2
 H1a --&amp;gt; H2a &amp;amp; H2b &amp;amp; H2c
 H1b --&amp;gt; H2a &amp;amp; H2b &amp;amp; H2c
 H1c --&amp;gt; H2a &amp;amp; H2b &amp;amp; H2c

 %% Connections: Hidden Layer 2 to Hidden Layer 3
 H2a --&amp;gt; H3a &amp;amp; H3b &amp;amp; H3c
 H2b --&amp;gt; H3a &amp;amp; H3b &amp;amp; H3c
 H2c --&amp;gt; H3a &amp;amp; H3b &amp;amp; H3c

 %% Connections: Hidden Layer 3 to Output
 H3a --&amp;gt; O
 H3b --&amp;gt; O
 H3c --&amp;gt; O

 %% Styling
 style I1 fill:#C8E6C9
 style I2 fill:#C8E6C9
 style I3 fill:#C8E6C9
 style H1a fill:#BBDEFB
 style H1b fill:#BBDEFB
 style H1c fill:#BBDEFB
 style H2a fill:#90CAF9
 style H2b fill:#90CAF9
 style H2c fill:#90CAF9
 style H3a fill:#64B5F6
 style H3b fill:#64B5F6
 style H3c fill:#64B5F6
 style O fill:#FFCDD2
 style subGraph0 stroke:none,fill:transparent
 style subGraph1 stroke:none,fill:transparent
 style subGraph2 stroke:none,fill:transparent
 style subGraph3 stroke:none,fill:transparent
 style subGraph4 stroke:none,fill:transparent
&lt;/pre>

&lt;hr>
&lt;h2 id="types-of-neural-networks">
 Types of Neural Networks
 
 &lt;a class="anchor" href="#types-of-neural-networks">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate&lt;/li>
&lt;li>CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection)&lt;/li>
&lt;li>RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation)&lt;/li>
&lt;li>Hybrid NN (e.g. Autonoumous Driving)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="components-of-dl">
 Components of DL
 
 &lt;a class="anchor" href="#components-of-dl">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Data&lt;/li>
&lt;li>Learning Algorithm : How to transform data&lt;/li>
&lt;li>&lt;strong>Loss Function&lt;/strong>: Objective function that &lt;strong>quantifies how well is model doing?&lt;/strong> lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.&lt;/li>
&lt;li>Optimnisation Algorithm: in order &lt;strong>to adjust the loss function&lt;/strong>, Learning Algorithm will try to &lt;strong>optimize our algorithm&lt;/strong>. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called &lt;strong>gradient descent&lt;/strong>.&lt;/li>
&lt;li>Model&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="operational-steps-for-neural-architectures-1">
 Operational Steps for Neural Architectures
 
 &lt;a class="anchor" href="#operational-steps-for-neural-architectures-1">#&lt;/a>
 
&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Step&lt;/th>
 &lt;th>Perceptron (Boolean/Logic)&lt;/th>
 &lt;th>Linear Regression Network&lt;/th>
 &lt;th>Binary Classification (Logistic)&lt;/th>
 &lt;th>DFNN / MLP (Classification)&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>1. Input&lt;/strong>&lt;/td>
 &lt;td>Binary/discrete inputs 
&lt;span>
 \( x_1, \dots, x_n \)
 &lt;/span>

&lt;/td>
 &lt;td>Numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>Numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>High-dimensional numerical or categorical features&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>2. Weighted Sum&lt;/strong>&lt;/td>
 &lt;td>
&lt;span>
 \( z = \sum (w_i x_i) + b \)
 &lt;/span>

&lt;/td>
 &lt;td>
&lt;span>
 \( \hat{y} = w_0 + w_1 x \)
 &lt;/span>

&lt;/td>
 &lt;td>
&lt;span>
 \( z = W x + b \)
 &lt;/span>

&lt;/td>
 &lt;td>
&lt;span>
 \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
 &lt;/span>

&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>3. Activation&lt;/strong>&lt;/td>
 &lt;td>Step: 1 if 
&lt;span>
 \( z \geq 0 \)
 &lt;/span>

, else 0&lt;/td>
 &lt;td>Identity: output = 
&lt;span>
 \( z \)
 &lt;/span>

&lt;/td>
 &lt;td>Sigmoid: maps 
&lt;span>
 \( z \)
 &lt;/span>

 to probability&lt;/td>
 &lt;td>ReLU (hidden), Softmax/Sigmoid (output)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>4. Loss / Error&lt;/strong>&lt;/td>
 &lt;td>Error = Target − Output&lt;/td>
 &lt;td>
&lt;span>
 \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)
 &lt;/span>

&lt;/td>
 &lt;td>Binary Cross-Entropy (BCE)&lt;/td>
 &lt;td>BCE or Categorical Cross-Entropy&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>5. Optimisation&lt;/strong>&lt;/td>
 &lt;td>Update on misclassification&lt;/td>
 &lt;td>Gradient Descent&lt;/td>
 &lt;td>Backpropagation (single layer)&lt;/td>
 &lt;td>Backpropagation (multi-layer chain rule)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>6. Output&lt;/strong>&lt;/td>
 &lt;td>Boolean (0 or 1)&lt;/td>
 &lt;td>Continuous value&lt;/td>
 &lt;td>Probability score&lt;/td>
 &lt;td>Probability vector (multi-class)&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="applications">
 Applications
 
 &lt;a class="anchor" href="#applications">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Computer Vision (e.g., face detection, medical imaging)&lt;/li>
&lt;li>Natural Language Processing (e.g., ChatGPT, translation)&lt;/li>
&lt;li>Self Driving Cars&lt;/li>
&lt;li>Speech Assistants (e.g., Alexa, Siri)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="intution">
 Intution
 
 &lt;a class="anchor" href="#intution">#&lt;/a>
 
&lt;/h2>
&lt;p>Deep Learning is the methodology, DNN is a model.&lt;/p></description></item><item><title>Attention Mechanism</title><link>https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/</guid><description>&lt;h1 id="attention-mechanism">
 Attention Mechanism
 
 &lt;a class="anchor" href="#attention-mechanism">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Queries, Keys, and Values&lt;/li>
&lt;li>Attention Pooling by Similarity&lt;/li>
&lt;li>Attention Pooling via Nadaraya–Watson Regression&lt;/li>
&lt;li>Attention Scoring Functions&lt;/li>
&lt;li>Dot Product Attention&lt;/li>
&lt;li>Convenience Functions&lt;/li>
&lt;li>Scaled Dot Product Attention&lt;/li>
&lt;li>Additive Attention&lt;/li>
&lt;li>Bahdanau Attention Mechanism&lt;/li>
&lt;li>Multi-Head Attention&lt;/li>
&lt;li>Self-Attention&lt;/li>
&lt;li>Positional Encoding&lt;/li>
&lt;li>Code implementation (webinar)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="reference">
 Reference
 
 &lt;a class="anchor" href="#reference">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Dive into deep learning. Cambridge University Press.&lt;/strong>. (&lt;a href="https://d2l.ai/chapter_builders-guide/model-construction.html">Ch 10&lt;/a>, &lt;a href="https://d2l.ai/chapter_convolutional-neural-networks/index.html">Ch7&lt;/a>&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>&lt;a href="https://arshadhs.github.io/">Home&lt;/a> | &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/">
 Deep Learning
&lt;/a>&lt;/p></description></item><item><title>Optimisation of Deep models</title><link>https://arshadhs.github.io/docs/ai/deep-learning/100-optimise-deep-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/100-optimise-deep-models/</guid><description>&lt;h1 id="optimisation-of-deep-models">
 Optimisation of Deep models
 
 &lt;a class="anchor" href="#optimisation-of-deep-models">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Goal of Optimization&lt;/li>
&lt;li>Optimization Challenges in Deep Learning&lt;/li>
&lt;li>Gradient Descent&lt;/li>
&lt;li>Stochastic Gradient Descent&lt;/li>
&lt;li>Minibatch Stochastic Gradient Descent&lt;/li>
&lt;li>Momentum&lt;/li>
&lt;li>Adagrad and Algorithm&lt;/li>
&lt;li>RMSProp and Algorithm&lt;/li>
&lt;li>Adadelta and Algorithm&lt;/li>
&lt;li>Adam and Algorithm&lt;/li>
&lt;li>Code Implementation and comparison of algorithms (webinar)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="reference">
 Reference
 
 &lt;a class="anchor" href="#reference">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Dive into deep learning. Cambridge University Press.&lt;/strong>. (Ch12)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>&lt;a href="https://arshadhs.github.io/">Home&lt;/a> | &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/">
 Deep Learning
&lt;/a>&lt;/p></description></item><item><title>Regularisation for Deep models</title><link>https://arshadhs.github.io/docs/ai/deep-learning/110-regularisation-deep-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/110-regularisation-deep-models/</guid><description>&lt;h1 id="regularisation-for-deep-models">
 Regularisation for Deep models
 
 &lt;a class="anchor" href="#regularisation-for-deep-models">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Generalization for regression&lt;/li>
&lt;li>Training Error and Generalization Error&lt;/li>
&lt;li>Underfitting or Overfitting&lt;/li>
&lt;li>Model Selection&lt;/li>
&lt;li>Weight Decay and Norms&lt;/li>
&lt;li>Generalization in Classification&lt;/li>
&lt;li>Environment and Distribution Shift&lt;/li>
&lt;li>Generalization in Deep Learning&lt;/li>
&lt;li>Dropout&lt;/li>
&lt;li>Batch Normalization&lt;/li>
&lt;li>Layer Normalization&lt;/li>
&lt;li>Code implementation (webinar)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="reference">
 Reference
 
 &lt;a class="anchor" href="#reference">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Dive into deep learning. Cambridge University Press.&lt;/strong>. (&lt;a href="https://d2l.ai/chapter_introduction/index.html">T1 – Ch 3.6, 3.7, T1 - Ch 4.6, 4.7, T1 - Ch 5.5, 5.6, T1 - Ch 8.5, T1 - Ch 11.7&lt;/a>&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>&lt;a href="https://arshadhs.github.io/">Home&lt;/a> | &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/">
 Deep Learning
&lt;/a>&lt;/p></description></item></channel></rss>