<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep Learning on Arshad Siddiqui</title><link>https://arshadhs.github.io/tags/deep-learning/</link><description>Recent content in Deep Learning on Arshad Siddiqui</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 22 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://arshadhs.github.io/tags/deep-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>LNN for Regression</title><link>https://arshadhs.github.io/docs/ai/deep-learning/030-linear-neural-networks-for-regression/</link><pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/030-linear-neural-networks-for-regression/</guid><description>&lt;h1 id="linear-neural-networks-for-regression">
 Linear Neural Networks for Regression
 
 &lt;a class="anchor" href="#linear-neural-networks-for-regression">#&lt;/a>
 
&lt;/h1>
&lt;p>A &lt;strong>linear neural network for regression&lt;/strong> is a model that predicts a &lt;strong>continuous&lt;/strong> target by taking a weighted sum of input features and applying the &lt;strong>identity activation&lt;/strong> (so the output can be any real number).&lt;/p>
&lt;ul>
&lt;li>Single neuron for regression (predicting &lt;em>how much&lt;/em> / &lt;em>how many&lt;/em>)&lt;/li>
&lt;li>Data + linear model (single neuron, no hidden layers) + squared loss&lt;/li>
&lt;li>Training using &lt;strong>batch gradient descent&lt;/strong> algorithm&lt;/li>
&lt;li>Prediction (inference)&lt;/li>
&lt;li>Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)&lt;/li>
&lt;/ul>
&lt;hr>


&lt;script src="https://arshadhs.github.io/mermaid.min.js">&lt;/script>

 &lt;script>mermaid.initialize({
 "flowchart": {
 "useMaxWidth":true
 },
 "theme": "default"
}
)&lt;/script>




&lt;pre class="mermaid">
flowchart LR
 D[&amp;#34;Data&amp;lt;br/&amp;gt;X, y&amp;#34;] --&amp;gt; M[&amp;#34;Linear model&amp;lt;br/&amp;gt;w, b&amp;lt;br/&amp;gt;Single neuron&amp;#34;]
 M --&amp;gt; A[&amp;#34;Activation&amp;lt;br/&amp;gt;Identity&amp;#34;]
 A --&amp;gt; L[&amp;#34;Loss&amp;lt;br/&amp;gt;MSE (Squared error)&amp;#34;]
 L --&amp;gt; O[&amp;#34;Optimiser&amp;lt;br/&amp;gt;Batch Gradient DescentBatch GD / Mini-batch GD&amp;#34;]
 O --&amp;gt; P[&amp;#34;Parameters&amp;lt;br/&amp;gt;w, b&amp;#34;]
 P --&amp;gt; I[&amp;#34;Inference&amp;lt;br/&amp;gt;Predict ŷ (number) for new x&amp;#34;]

 %% Pastel colour scheme
 style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
 style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
 style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
 style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
 style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
 style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
 style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
&lt;/pre>

&lt;hr>
&lt;h2 id="regression">
 Regression
 
 &lt;a class="anchor" href="#regression">#&lt;/a>
 
&lt;/h2>
&lt;p>Regression is a supervised learning task that predicts a continuous-valued output based on input features.&lt;/p></description></item><item><title>Gradient Descent Algorithm</title><link>https://arshadhs.github.io/docs/ai/deep-learning/035-gradient-descent-algorithm/</link><pubDate>Thu, 26 Feb 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/035-gradient-descent-algorithm/</guid><description>&lt;h1 id="gradient-descent-algorithm">
 Gradient Descent Algorithm
 
 &lt;a class="anchor" href="#gradient-descent-algorithm">#&lt;/a>
 
&lt;/h1>
&lt;p>Gradient Descent Algorithm (GDA) is&lt;/p>
&lt;ul>
&lt;li>an &lt;strong>optimisation method&lt;/strong>&lt;/li>
&lt;li>used to &lt;strong>train models&lt;/strong>&lt;/li>
&lt;li>by repeatedly updating parameters (weights and biases) to &lt;strong>reduce the loss&lt;/strong>&lt;/li>
&lt;/ul>
&lt;blockquote class="book-hint info">
&lt;p>In deep learning, the default training approach is almost always &lt;strong>mini-batch gradient descent&lt;/strong>, usually with &lt;strong>Adam&lt;/strong> or &lt;strong>SGD + momentum&lt;/strong>.&lt;/p>
&lt;/blockquote>
&lt;p>Gradient Descent is &lt;strong>used in both regression and classification&lt;/strong>.&lt;/p>
&lt;p>It’s not tied to the task type — it’s tied to the fact you have:&lt;/p></description></item><item><title>Deep Learning</title><link>https://arshadhs.github.io/docs/ai/deep-learning/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/</guid><description>&lt;h1 id="deep-learning">
 Deep Learning
 
 &lt;a class="anchor" href="#deep-learning">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Subset of ML&lt;/li>
&lt;li>focuses on algorithms inspired by the structure and function of the brain called &lt;strong>Artificial Neural Networks&lt;/strong>.&lt;/li>
&lt;li>A &lt;a href="https://arshadhs.github.io/docs/ai/neural-network/">neural network&lt;/a> with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network.&lt;/li>
&lt;li>Allows systems to &lt;strong>automatically learn hierarchical representations&lt;/strong> (features) from raw input, such as images, sound, or text.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="operational-steps-for-neural-architectures">
 Operational Steps for Neural Architectures
 
 &lt;a class="anchor" href="#operational-steps-for-neural-architectures">#&lt;/a>
 
&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Step&lt;/th>
 &lt;th>Perceptron (Boolean/Logic)&lt;/th>
 &lt;th>Linear Regression Network&lt;/th>
 &lt;th>Binary Classification (Logistic)&lt;/th>
 &lt;th>DFNN / MLP (Classification)&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>1. Input&lt;/strong>&lt;/td>
 &lt;td>Take binary or discrete inputs 
&lt;span>
 \( x_1, \dots, x_n \)
 &lt;/span>

&lt;/td>
 &lt;td>Take numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>Take numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>Take high-dimensional numerical or categorical features&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>2. Weighted Sum&lt;/strong>&lt;/td>
 &lt;td>Single calculation: 
&lt;span>
 \( z = \sum (w_i x_i) + b \)
 &lt;/span>

&lt;/td>
 &lt;td>Single calculation: 
&lt;span>
 \( \hat{y} = w_0 + w_1 x \)
 &lt;/span>

&lt;/td>
 &lt;td>Single calculation: 
&lt;span>
 \( z = W x + b \)
 &lt;/span>

&lt;/td>
 &lt;td>Multiple stages: 
&lt;span>
 \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
 &lt;/span>

 for each layer 
&lt;span>
 \( l \)
 &lt;/span>

&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>3. Activation&lt;/strong>&lt;/td>
 &lt;td>Step Function: Output 1 if 
&lt;span>
 \( z \geq 0 \)
 &lt;/span>

, else 0&lt;/td>
 &lt;td>Identity: The output remains 
&lt;span>
 \( z \)
 &lt;/span>

 (no non-linear change)&lt;/td>
 &lt;td>Sigmoid: Maps 
&lt;span>
 \( z \)
 &lt;/span>

 to a probability between 0 and 1&lt;/td>
 &lt;td>ReLU for hidden layers; Softmax/Sigmoid for the output layer&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>4. Loss / Error&lt;/strong>&lt;/td>
 &lt;td>Error = Target − Output&lt;/td>
 &lt;td>Mean Squared Error (MSE): 
&lt;span>
 \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)
 &lt;/span>

&lt;/td>
 &lt;td>Binary Cross-Entropy (BCE): penalises based on probability distance&lt;/td>
 &lt;td>BCE or Categorical Cross-Entropy for multiple classes&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>5. Optimisation&lt;/strong>&lt;/td>
 &lt;td>Update weights only on misclassification&lt;/td>
 &lt;td>Gradient Descent: compute gradients at initialization and update weights&lt;/td>
 &lt;td>Backpropagation: compute error signals 
&lt;span>
 \( \delta \)
 &lt;/span>

 and gradients 
&lt;span>
 \( dW \)
 &lt;/span>

&lt;/td>
 &lt;td>Backpropagation: recursive chain rule to update all hidden layer weights&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>6. Output&lt;/strong>&lt;/td>
 &lt;td>Discrete Boolean value (0 or 1)&lt;/td>
 &lt;td>Continuous numerical value (e.g., house prices)&lt;/td>
 &lt;td>Single probability score or class label&lt;/td>
 &lt;td>A vector of probabilities for multiple classes&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;hr>




&lt;ul>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/010-neural-network/">Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/020-perceptron/">Artificial Neuron and Perceptron&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/030-linear-neural-networks-for-regression/">LNN for Regression&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/035-gradient-descent-algorithm/">Gradient Descent Algorithm&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/040-linear-neural-networks-for-classification/">LNN for Classification&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/050-deep-feedforward/">Deep Feedforward Neural Networks (DFNN) for Classification&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/060-cnn-fundamentals/">Convolutional Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/065-deep-cnn-architectures/">Deep CNN Architectures&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/067-cnn-model/">CNN Pipeline&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/070-recurrent-nn/">Recurrent Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/075-recurrent-nn-deep/">Deep Recurrent Neural Networks&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/">Attention Mechanism&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/090-transformer/">Transformer&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/100-optimise-deep-models/">Optimisation of Deep models&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/110-regularisation-deep-models/">Regularisation for Deep models&lt;/a>
 &lt;/li>
 
 

 
 
&lt;/ul>


&lt;hr>


&lt;pre class="mermaid">
flowchart LR
 %% Input Layer
 subgraph subGraph0[&amp;#34;Input Layer&amp;#34;]
 I1((&amp;#34;Input 1&amp;#34;))
 I2((&amp;#34;Input 2&amp;#34;))
 I3((&amp;#34;Input 3&amp;#34;))
 end

 %% Hidden Layers
 subgraph subGraph1[&amp;#34;Hidden Layer 1&amp;#34;]
 H1a((&amp;#34;H1-1&amp;#34;))
 H1b((&amp;#34;H1-2&amp;#34;))
 H1c((&amp;#34;H1-3&amp;#34;))
 end

 subgraph subGraph2[&amp;#34;Hidden Layer 2&amp;#34;]
 H2a((&amp;#34;H2-1&amp;#34;))
 H2b((&amp;#34;H2-2&amp;#34;))
 H2c((&amp;#34;H2-3&amp;#34;))
 end

 subgraph subGraph3[&amp;#34;Hidden Layer 3&amp;#34;]
 H3a((&amp;#34;H3-1&amp;#34;))
 H3b((&amp;#34;H3-2&amp;#34;))
 H3c((&amp;#34;H3-3&amp;#34;))
 end

 %% Output Layer
 subgraph subGraph4[&amp;#34;Output Layer&amp;#34;]
 O((&amp;#34;Output&amp;#34;))
 end

 %% Connections: Input to Hidden Layer 1
 I1 --&amp;gt; H1a &amp;amp; H1b &amp;amp; H1c
 I2 --&amp;gt; H1a &amp;amp; H1b &amp;amp; H1c
 I3 --&amp;gt; H1a &amp;amp; H1b &amp;amp; H1c

 %% Connections: Hidden Layer 1 to Hidden Layer 2
 H1a --&amp;gt; H2a &amp;amp; H2b &amp;amp; H2c
 H1b --&amp;gt; H2a &amp;amp; H2b &amp;amp; H2c
 H1c --&amp;gt; H2a &amp;amp; H2b &amp;amp; H2c

 %% Connections: Hidden Layer 2 to Hidden Layer 3
 H2a --&amp;gt; H3a &amp;amp; H3b &amp;amp; H3c
 H2b --&amp;gt; H3a &amp;amp; H3b &amp;amp; H3c
 H2c --&amp;gt; H3a &amp;amp; H3b &amp;amp; H3c

 %% Connections: Hidden Layer 3 to Output
 H3a --&amp;gt; O
 H3b --&amp;gt; O
 H3c --&amp;gt; O

 %% Styling
 style I1 fill:#C8E6C9
 style I2 fill:#C8E6C9
 style I3 fill:#C8E6C9
 style H1a fill:#BBDEFB
 style H1b fill:#BBDEFB
 style H1c fill:#BBDEFB
 style H2a fill:#90CAF9
 style H2b fill:#90CAF9
 style H2c fill:#90CAF9
 style H3a fill:#64B5F6
 style H3b fill:#64B5F6
 style H3c fill:#64B5F6
 style O fill:#FFCDD2
 style subGraph0 stroke:none,fill:transparent
 style subGraph1 stroke:none,fill:transparent
 style subGraph2 stroke:none,fill:transparent
 style subGraph3 stroke:none,fill:transparent
 style subGraph4 stroke:none,fill:transparent
&lt;/pre>

&lt;hr>
&lt;h2 id="types-of-neural-networks">
 Types of Neural Networks
 
 &lt;a class="anchor" href="#types-of-neural-networks">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate&lt;/li>
&lt;li>CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection)&lt;/li>
&lt;li>RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation)&lt;/li>
&lt;li>Hybrid NN (e.g. Autonoumous Driving)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="components-of-dl">
 Components of DL
 
 &lt;a class="anchor" href="#components-of-dl">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Data&lt;/li>
&lt;li>Learning Algorithm : How to transform data&lt;/li>
&lt;li>&lt;strong>Loss Function&lt;/strong>: Objective function that &lt;strong>quantifies how well is model doing?&lt;/strong> lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.&lt;/li>
&lt;li>Optimnisation Algorithm: in order &lt;strong>to adjust the loss function&lt;/strong>, Learning Algorithm will try to &lt;strong>optimize our algorithm&lt;/strong>. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called &lt;strong>gradient descent&lt;/strong>.&lt;/li>
&lt;li>Model&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="operational-steps-for-neural-architectures-1">
 Operational Steps for Neural Architectures
 
 &lt;a class="anchor" href="#operational-steps-for-neural-architectures-1">#&lt;/a>
 
&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Step&lt;/th>
 &lt;th>Perceptron (Boolean/Logic)&lt;/th>
 &lt;th>Linear Regression Network&lt;/th>
 &lt;th>Binary Classification (Logistic)&lt;/th>
 &lt;th>DFNN / MLP (Classification)&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>1. Input&lt;/strong>&lt;/td>
 &lt;td>Binary/discrete inputs 
&lt;span>
 \( x_1, \dots, x_n \)
 &lt;/span>

&lt;/td>
 &lt;td>Numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>Numerical features 
&lt;span>
 \( x \)
 &lt;/span>

&lt;/td>
 &lt;td>High-dimensional numerical or categorical features&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>2. Weighted Sum&lt;/strong>&lt;/td>
 &lt;td>
&lt;span>
 \( z = \sum (w_i x_i) + b \)
 &lt;/span>

&lt;/td>
 &lt;td>
&lt;span>
 \( \hat{y} = w_0 + w_1 x \)
 &lt;/span>

&lt;/td>
 &lt;td>
&lt;span>
 \( z = W x + b \)
 &lt;/span>

&lt;/td>
 &lt;td>
&lt;span>
 \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
 &lt;/span>

&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>3. Activation&lt;/strong>&lt;/td>
 &lt;td>Step: 1 if 
&lt;span>
 \( z \geq 0 \)
 &lt;/span>

, else 0&lt;/td>
 &lt;td>Identity: output = 
&lt;span>
 \( z \)
 &lt;/span>

&lt;/td>
 &lt;td>Sigmoid: maps 
&lt;span>
 \( z \)
 &lt;/span>

 to probability&lt;/td>
 &lt;td>ReLU (hidden), Softmax/Sigmoid (output)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>4. Loss / Error&lt;/strong>&lt;/td>
 &lt;td>Error = Target − Output&lt;/td>
 &lt;td>
&lt;span>
 \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)
 &lt;/span>

&lt;/td>
 &lt;td>Binary Cross-Entropy (BCE)&lt;/td>
 &lt;td>BCE or Categorical Cross-Entropy&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>5. Optimisation&lt;/strong>&lt;/td>
 &lt;td>Update on misclassification&lt;/td>
 &lt;td>Gradient Descent&lt;/td>
 &lt;td>Backpropagation (single layer)&lt;/td>
 &lt;td>Backpropagation (multi-layer chain rule)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>6. Output&lt;/strong>&lt;/td>
 &lt;td>Boolean (0 or 1)&lt;/td>
 &lt;td>Continuous value&lt;/td>
 &lt;td>Probability score&lt;/td>
 &lt;td>Probability vector (multi-class)&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="applications">
 Applications
 
 &lt;a class="anchor" href="#applications">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Computer Vision (e.g., face detection, medical imaging)&lt;/li>
&lt;li>Natural Language Processing (e.g., ChatGPT, translation)&lt;/li>
&lt;li>Self Driving Cars&lt;/li>
&lt;li>Speech Assistants (e.g., Alexa, Siri)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="intution">
 Intution
 
 &lt;a class="anchor" href="#intution">#&lt;/a>
 
&lt;/h2>
&lt;p>Deep Learning is the methodology, DNN is a model.&lt;/p></description></item><item><title>LNN for Classification</title><link>https://arshadhs.github.io/docs/ai/deep-learning/040-linear-neural-networks-for-classification/</link><pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/040-linear-neural-networks-for-classification/</guid><description>&lt;h1 id="linear-nn-for-classification">
 Linear NN for Classification
 
 &lt;a class="anchor" href="#linear-nn-for-classification">#&lt;/a>
 
&lt;/h1>
&lt;p>A &lt;strong>Linear Neural Network (LNN) for classification&lt;/strong> uses &lt;strong>no hidden layers&lt;/strong>.&lt;br>
It learns a &lt;strong>linear decision boundary&lt;/strong> and outputs &lt;strong>class probabilities&lt;/strong>, then converts them into predicted classes.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>Neural-network view:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Binary classification&lt;/strong> → logistic regression (single neuron + sigmoid)&lt;/li>
&lt;li>&lt;strong>Multi-class classification&lt;/strong> → softmax regression (K output neurons + softmax)&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;hr>


&lt;script src="https://arshadhs.github.io/mermaid.min.js">&lt;/script>

 &lt;script>mermaid.initialize({
 "flowchart": {
 "useMaxWidth":true
 },
 "theme": "default"
}
)&lt;/script>




&lt;pre class="mermaid">
flowchart LR
 D[&amp;#34;Data&amp;lt;br/&amp;gt;X, y&amp;#34;] --&amp;gt; M[&amp;#34;Linear model&amp;lt;br/&amp;gt;w, b&amp;#34;]
 M --&amp;gt; A[&amp;#34;Activation&amp;lt;br/&amp;gt;Sigmoid / Softmax&amp;#34;]
 A --&amp;gt; L[&amp;#34;Loss&amp;lt;br/&amp;gt;Cross-entropy&amp;#34;]
 L --&amp;gt; O[&amp;#34;Optimiser&amp;lt;br/&amp;gt;Mini-batch GD / Adam&amp;#34;]
 O --&amp;gt; P[&amp;#34;Updated parameters&amp;lt;br/&amp;gt;w, b&amp;#34;]
 P --&amp;gt; I[&amp;#34;Inference&amp;lt;br/&amp;gt;Probabilities → class&amp;#34;]

 %% Pastel colour scheme
 style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
 style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
 style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
 style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
 style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
 style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
 style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
&lt;/pre>

&lt;hr>
&lt;h2 id="classification">
 Classification
 
 &lt;a class="anchor" href="#classification">#&lt;/a>
 
&lt;/h2>
&lt;p>Classification predicts a &lt;strong>discrete class label&lt;/strong>.&lt;br>
Common settings:&lt;/p></description></item><item><title>LLM - Model</title><link>https://arshadhs.github.io/docs/ai/genai/llm/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/genai/llm/</guid><description>&lt;h1 id="llm--large-language-model">
 LLM – Large Language Model
 
 &lt;a class="anchor" href="#llm--large-language-model">#&lt;/a>
 
&lt;/h1>
&lt;p>Large Language Models (LLMs) are &lt;strong>advanced AI systems&lt;/strong> designed to process, understand, and generate &lt;strong>human-like text&lt;/strong>.&lt;/p>
&lt;p>They learn language by analysing &lt;strong>massive amounts of text data&lt;/strong>, discovering patterns in:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>grammar&lt;/p>
&lt;/li>
&lt;li>
&lt;p>meaning&lt;/p>
&lt;/li>
&lt;li>
&lt;p>context&lt;/p>
&lt;/li>
&lt;li>
&lt;p>relationships between words and sentences&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Built on &lt;strong>Deep Learning&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Implemented using &lt;strong>Neural Networks&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Based on &lt;strong>Transformers&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Often combined with tools like:&lt;/p>
&lt;ul>
&lt;li>Retrieval (RAG)&lt;/li>
&lt;li>Agents&lt;/li>
&lt;li>External APIs&lt;/li>
&lt;li>Memory systems&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="what-makes-an-llm-special">
 What makes an LLM special?
 
 &lt;a class="anchor" href="#what-makes-an-llm-special">#&lt;/a>
 
&lt;/h2>
&lt;ul>
&lt;li>Built using &lt;strong>deep neural networks&lt;/strong>&lt;/li>
&lt;li>Trained on &lt;strong>very large datasets&lt;/strong> (books, articles, code, web text)&lt;/li>
&lt;li>Can perform many tasks &lt;strong>without task-specific training&lt;/strong>&lt;/li>
&lt;li>General-purpose language understanding, not single-task models&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="foundation-transformer-architecture">
 Foundation: Transformer Architecture
 
 &lt;a class="anchor" href="#foundation-transformer-architecture">#&lt;/a>
 
&lt;/h2>
&lt;p>LLMs are based on the &lt;strong>&lt;a href="https://arshadhs.github.io/docs/ai/deep-learning/transformer/">Transformer Architecture&lt;/a>&lt;/strong>, which allows models to understand &lt;strong>context and long-range dependencies&lt;/strong> in text.&lt;/p></description></item><item><title>Deep Feedforward Neural Networks (DFNN) for Classification</title><link>https://arshadhs.github.io/docs/ai/deep-learning/050-deep-feedforward/</link><pubDate>Thu, 26 Feb 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/050-deep-feedforward/</guid><description>&lt;h1 id="deep-feedforward-neural-networks-dfnn-or-multi-layer-perceptrons-mlp-for-classification">
 Deep Feedforward Neural Networks (DFNN) or Multi Layer Perceptrons (MLP) for Classification
 
 &lt;a class="anchor" href="#deep-feedforward-neural-networks-dfnn-or-multi-layer-perceptrons-mlp-for-classification">#&lt;/a>
 
&lt;/h1>
&lt;p>A &lt;strong>Deep Feedforward Neural Network (DFNN)&lt;/strong>, also called a &lt;strong>Multi-Layer Perceptron (MLP)&lt;/strong>, is a neural network with one or more &lt;strong>hidden layers&lt;/strong> where information flows &lt;strong>forward only&lt;/strong> (no recurrence).&lt;br>
For classification, DFNNs learn &lt;strong>non-linear decision boundaries&lt;/strong> by combining hidden layers with &lt;strong>non-linear activation functions&lt;/strong>.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>Core idea:&lt;/p>
&lt;ul>
&lt;li>A single neuron can only learn &lt;strong>linear&lt;/strong> boundaries.&lt;/li>
&lt;li>Adding &lt;strong>hidden layers + non-linearity&lt;/strong> allows DFNNs to solve problems like &lt;strong>XOR&lt;/strong>.&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="mlp-as-solution-for-xor">
 MLP as solution for XOR
 
 &lt;a class="anchor" href="#mlp-as-solution-for-xor">#&lt;/a>
 
&lt;/h2>
&lt;p>A single perceptron fails on XOR because XOR is &lt;strong>not linearly separable&lt;/strong>.&lt;/p></description></item><item><title>Convolutional Neural Networks</title><link>https://arshadhs.github.io/docs/ai/deep-learning/060-cnn-fundamentals/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/060-cnn-fundamentals/</guid><description>&lt;h1 id="convolutional-neural-networks-cnn">
 Convolutional Neural Networks (CNN)
 
 &lt;a class="anchor" href="#convolutional-neural-networks-cnn">#&lt;/a>
 
&lt;/h1>
&lt;p>Convolutional Neural Networks (CNNs) are specialised neural networks designed for data with spatial structure, especially images. They became the standard model for computer vision because they preserve spatial locality, reuse the same pattern detector across the image, and build representations hierarchically. In practical terms, a CNN starts by learning simple features such as edges and corners, then combines them into textures, shapes, object parts, and finally full semantic categories.&lt;/p></description></item><item><title>Deep CNN Architectures</title><link>https://arshadhs.github.io/docs/ai/deep-learning/065-deep-cnn-architectures/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/065-deep-cnn-architectures/</guid><description>&lt;h1 id="deep-cnn-architectures">
 Deep CNN Architectures
 
 &lt;a class="anchor" href="#deep-cnn-architectures">#&lt;/a>
 
&lt;/h1>
&lt;p>Once the basic ideas of convolution, pooling, channels, and classifier heads are understood, the next step is to study how successful CNN architectures are designed in practice. The history of deep CNNs is not just a list of famous models. It is a progression of design ideas: smaller filters, more depth, better optimisation, bottlenecks, multi-scale processing, residual connections, and transfer learning.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Deep CNN architectures evolved by solving specific problems one by one: &lt;strong>LeNet&lt;/strong> established the template, &lt;strong>AlexNet&lt;/strong> proved deep learning could dominate large-scale vision, &lt;strong>VGG&lt;/strong> simplified the design, &lt;strong>NiN&lt;/strong> introduced powerful &lt;code>1 × 1&lt;/code> ideas, &lt;strong>GoogLeNet&lt;/strong> made multi-scale processing efficient, and &lt;strong>ResNet&lt;/strong> solved the optimisation problem of very deep networks.&lt;/p></description></item><item><title>CNN Pipeline</title><link>https://arshadhs.github.io/docs/ai/deep-learning/067-cnn-model/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/067-cnn-model/</guid><description>&lt;h1 id="cnn-pipeline-preprocessing--models">
 CNN Pipeline: Preprocessing &amp;amp; Models
 
 &lt;a class="anchor" href="#cnn-pipeline-preprocessing--models">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Understand CNN concepts deeply&lt;/li>
&lt;li>Build CNN models step-by-step&lt;/li>
&lt;li>Apply CNNs in assignments using Keras&lt;/li>
&lt;/ul>
&lt;blockquote class="book-hint info">
&lt;p>Think of CNN as a pipeline:
Image → Features → Patterns → Prediction&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h1 id="1-image-representation">
 1. Image Representation
 
 &lt;a class="anchor" href="#1-image-representation">#&lt;/a>
 
&lt;/h1>
&lt;span style="color: green;">
 &lt;link rel="stylesheet" href="https://arshadhs.github.io/katex/katex.min.css" />
&lt;script defer src="https://arshadhs.github.io/katex/katex.min.js">&lt;/script>
 &lt;script defer src="https://arshadhs.github.io/katex/auto-render.min.js" onload="renderMathInElement(document.body, {
 &amp;#34;delimiters&amp;#34;: [
 {&amp;#34;left&amp;#34;: &amp;#34;$$&amp;#34;, &amp;#34;right&amp;#34;: &amp;#34;$$&amp;#34;, &amp;#34;display&amp;#34;: true},
 {&amp;#34;left&amp;#34;: &amp;#34;$&amp;#34;, &amp;#34;right&amp;#34;: &amp;#34;$&amp;#34;, &amp;#34;display&amp;#34;: false},
 {&amp;#34;left&amp;#34;: &amp;#34;\\(&amp;#34;, &amp;#34;right&amp;#34;: &amp;#34;\\)&amp;#34;, &amp;#34;display&amp;#34;: false},
 {&amp;#34;left&amp;#34;: &amp;#34;\\[&amp;#34;, &amp;#34;right&amp;#34;: &amp;#34;\\]&amp;#34;, &amp;#34;display&amp;#34;: true}
 ]
});">&lt;/script>
&lt;span>
 \[ 
X \in \mathbb{R}^{H \times W \times C}
 \]
 &lt;/span>
&lt;/span>
&lt;ul>
&lt;li>H = Height&lt;/li>
&lt;li>W = Width&lt;/li>
&lt;li>C = Channels&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h1 id="2-convolution-operation">
 2. Convolution Operation
 
 &lt;a class="anchor" href="#2-convolution-operation">#&lt;/a>
 
&lt;/h1>
&lt;span style="color: green;">
 &lt;span>
 \[ 
Z(i,j) = \sum_{m,n} X(i+m, j+n) \cdot K(m,n)
 \]
 &lt;/span>
&lt;/span>
&lt;ul>
&lt;li>Sliding filter extracts features&lt;/li>
&lt;li>Produces feature maps&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h1 id="3-stride--padding">
 3. Stride &amp;amp; Padding
 
 &lt;a class="anchor" href="#3-stride--padding">#&lt;/a>
 
&lt;/h1>
&lt;span style="color: green;">
 &lt;span>
 \[ 
Output = \frac{N - F + 2P}{S} + 1
 \]
 &lt;/span>
&lt;/span>
&lt;hr>
&lt;h1 id="4-activation-relu">
 4. Activation (ReLU)
 
 &lt;a class="anchor" href="#4-activation-relu">#&lt;/a>
 
&lt;/h1>
&lt;span style="color: green;">
 &lt;span>
 \[ 
ReLU(x) = max(0, x)
 \]
 &lt;/span>
&lt;/span>
&lt;hr>
&lt;h1 id="5-pooling">
 5. Pooling
 
 &lt;a class="anchor" href="#5-pooling">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Max Pooling → strongest feature&lt;/li>
&lt;li>Average Pooling → smooth&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h1 id="6-global-average-pooling">
 6. Global Average Pooling
 
 &lt;a class="anchor" href="#6-global-average-pooling">#&lt;/a>
 
&lt;/h1>
&lt;span style="color: green;">
 &lt;span>
 \[ 
y_k = \frac{1}{HW} \sum_{i,j} x_{i,j,k}
 \]
 &lt;/span>
&lt;/span>
&lt;hr>
&lt;h1 id="7-loss-function">
 7. Loss Function
 
 &lt;a class="anchor" href="#7-loss-function">#&lt;/a>
 
&lt;/h1>
&lt;span style="color: green;">
 &lt;span>
 \[ 
L = - \sum y \log(\hat{y})
 \]
 &lt;/span>
&lt;/span>
&lt;hr>
&lt;h1 id="8-cnn-architecture">
 8. CNN Architecture
 
 &lt;a class="anchor" href="#8-cnn-architecture">#&lt;/a>
 
&lt;/h1>


&lt;script src="https://arshadhs.github.io/mermaid.min.js">&lt;/script>

 &lt;script>mermaid.initialize({
 "flowchart": {
 "useMaxWidth":true
 },
 "theme": "default"
}
)&lt;/script>




&lt;pre class="mermaid">graph LR
A[Input Image] --&amp;gt; B[Conv]
B --&amp;gt; C[ReLU]
C --&amp;gt; D[Pooling]
D --&amp;gt; E[Conv Layers]
E --&amp;gt; F[Flatten / GAP]
F --&amp;gt; G[Dense]
G --&amp;gt; H[Output]&lt;/pre>
&lt;hr>
&lt;h1 id="9-training">
 9. Training
 
 &lt;a class="anchor" href="#9-training">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Forward pass&lt;/li>
&lt;li>Loss computation&lt;/li>
&lt;li>Backpropagation&lt;/li>
&lt;li>Weight update&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h1 id="10-keras-implementation">
 10. Keras Implementation
 
 &lt;a class="anchor" href="#10-keras-implementation">#&lt;/a>
 
&lt;/h1>
&lt;h2 id="model">
 Model
 
 &lt;a class="anchor" href="#model">#&lt;/a>
 
&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> tensorflow.keras.models &lt;span style="color:#f92672">import&lt;/span> Sequential
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> tensorflow.keras.layers &lt;span style="color:#f92672">import&lt;/span> Conv2D, MaxPooling2D, Dense, Flatten
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model &lt;span style="color:#f92672">=&lt;/span> Sequential()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(Conv2D(&lt;span style="color:#ae81ff">32&lt;/span>, (&lt;span style="color:#ae81ff">3&lt;/span>,&lt;span style="color:#ae81ff">3&lt;/span>), activation&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;relu&amp;#39;&lt;/span>, input_shape&lt;span style="color:#f92672">=&lt;/span>(&lt;span style="color:#ae81ff">64&lt;/span>,&lt;span style="color:#ae81ff">64&lt;/span>,&lt;span style="color:#ae81ff">3&lt;/span>)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(MaxPooling2D((&lt;span style="color:#ae81ff">2&lt;/span>,&lt;span style="color:#ae81ff">2&lt;/span>)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(Conv2D(&lt;span style="color:#ae81ff">64&lt;/span>, (&lt;span style="color:#ae81ff">3&lt;/span>,&lt;span style="color:#ae81ff">3&lt;/span>), activation&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;relu&amp;#39;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(MaxPooling2D((&lt;span style="color:#ae81ff">2&lt;/span>,&lt;span style="color:#ae81ff">2&lt;/span>)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(Flatten())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(Dense(&lt;span style="color:#ae81ff">128&lt;/span>, activation&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;relu&amp;#39;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>add(Dense(&lt;span style="color:#ae81ff">1&lt;/span>, activation&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;sigmoid&amp;#39;&lt;/span>))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="compile">
 Compile
 
 &lt;a class="anchor" href="#compile">#&lt;/a>
 
&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>compile(optimizer&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;adam&amp;#39;&lt;/span>, loss&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;binary_crossentropy&amp;#39;&lt;/span>, metrics&lt;span style="color:#f92672">=&lt;/span>[&lt;span style="color:#e6db74">&amp;#39;accuracy&amp;#39;&lt;/span>])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="train">
 Train
 
 &lt;a class="anchor" href="#train">#&lt;/a>
 
&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>model&lt;span style="color:#f92672">.&lt;/span>fit(X_train, y_train, epochs&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">10&lt;/span>, batch_size&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">32&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="predict">
 Predict
 
 &lt;a class="anchor" href="#predict">#&lt;/a>
 
&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>pred &lt;span style="color:#f92672">=&lt;/span> model&lt;span style="color:#f92672">.&lt;/span>predict(X_test)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h1 id="11-tips">
 11. Tips
 
 &lt;a class="anchor" href="#11-tips">#&lt;/a>
 
&lt;/h1>
&lt;ul>
&lt;li>Normalize images&lt;/li>
&lt;li>Use small filters&lt;/li>
&lt;li>Avoid too many dense layers&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h1 id="12-summary">
 12. Summary
 
 &lt;a class="anchor" href="#12-summary">#&lt;/a>
 
&lt;/h1>
&lt;blockquote class="book-hint info">
&lt;p>CNN = Automatic feature extractor + classifier&lt;/p></description></item><item><title>Recurrent Neural Networks</title><link>https://arshadhs.github.io/docs/ai/deep-learning/070-recurrent-nn/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/070-recurrent-nn/</guid><description>&lt;h1 id="recurrent-neural-networks">
 Recurrent Neural Networks
 
 &lt;a class="anchor" href="#recurrent-neural-networks">#&lt;/a>
 
&lt;/h1>
&lt;p>Recurrent Neural Networks (RNNs) are neural networks designed for &lt;strong>sequential data&lt;/strong>, where the order of inputs matters and the model must use information from earlier time steps to interpret later ones. Unlike a feedforward network, an RNN does not process each input in isolation. It carries a &lt;strong>hidden state&lt;/strong> from one time step to the next, so the network can build a running summary of what it has seen so far.&lt;/p></description></item><item><title>Deep Recurrent Neural Networks</title><link>https://arshadhs.github.io/docs/ai/deep-learning/075-recurrent-nn-deep/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/075-recurrent-nn-deep/</guid><description>&lt;h1 id="deep-recurrent-neural-networks">
 Deep Recurrent Neural Networks
 
 &lt;a class="anchor" href="#deep-recurrent-neural-networks">#&lt;/a>
 
&lt;/h1>
&lt;p>Vanilla RNNs introduce the hidden-state idea, but they struggle on longer and more complex sequences because gradients can vanish across time. Deep recurrent models extend the RNN idea in two important ways:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>make the recurrent architecture richer&lt;/strong>, for example by stacking multiple recurrent layers or using information from both directions,&lt;/li>
&lt;li>&lt;strong>use gates and memory cells&lt;/strong> to control what should be remembered, forgotten, updated, and exposed.&lt;/li>
&lt;/ol>
&lt;p>This is why practical recurrent modelling usually moves from a simple RNN to &lt;strong>stacked RNNs, bidirectional RNNs, GRUs, or LSTMs&lt;/strong>.&lt;/p></description></item></channel></rss>