<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mini-Batch Gradient Descent on Arshad Siddiqui</title><link>https://arshadhs.github.io/tags/mini-batch-gradient-descent/</link><description>Recent content in Mini-Batch Gradient Descent on Arshad Siddiqui</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://arshadhs.github.io/tags/mini-batch-gradient-descent/index.xml" rel="self" type="application/rss+xml"/><item><title>Gradient Descent and Mini-Batch Gradient Descent</title><link>https://arshadhs.github.io/docs/ai/deep-learning/101-gradient-descent-and-mini-batch-gd/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/101-gradient-descent-and-mini-batch-gd/</guid><description>&lt;h1 id="optimisation-gradient-descent-and-mini-batch-gradient-descent">
 Optimisation: Gradient Descent and Mini-Batch Gradient Descent
 
 &lt;a class="anchor" href="#optimisation-gradient-descent-and-mini-batch-gradient-descent">#&lt;/a>
 
&lt;/h1>
&lt;p>Gradient descent is the core optimisation idea behind neural network training.
It updates the model parameters by moving in the opposite direction of the gradient of the loss.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Gradient descent uses the gradient to decide how to change the parameters.
The learning rate controls how large each update step is.&lt;/p>
&lt;/blockquote>
&lt;hr>


&lt;pre class="mermaid">
flowchart TD
 A[&amp;#34;Gradient Descent Variants&amp;#34;] --&amp;gt; B[&amp;#34;Batch Gradient Descent&amp;#34;]
 A --&amp;gt; C[&amp;#34;Stochastic Gradient Descent&amp;#34;]
 A --&amp;gt; D[&amp;#34;Mini-batch Gradient Descent&amp;#34;]

 B --&amp;gt; B1[&amp;#34;Uses full dataset&amp;#34;]
 B --&amp;gt; B2[&amp;#34;One update per epoch&amp;#34;]
 B --&amp;gt; B3[&amp;#34;Smooth but slow&amp;#34;]

 C --&amp;gt; C1[&amp;#34;Uses one example at a time&amp;#34;]
 C --&amp;gt; C2[&amp;#34;Frequent updates&amp;#34;]
 C --&amp;gt; C3[&amp;#34;Fast but noisy&amp;#34;]

 D --&amp;gt; D1[&amp;#34;Uses small batches&amp;#34;]
 D --&amp;gt; D2[&amp;#34;Efficient on hardware&amp;#34;]
 D --&amp;gt; D3[&amp;#34;Balanced and practical&amp;#34;]

 style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
 style B fill:#EDE7F6,stroke:#7E57C2
 style C fill:#C8E6C9,stroke:#43A047
 style D fill:#FFF9C4,stroke:#FBC02D
&lt;/pre>

&lt;hr>
&lt;h2 id="gradient-descent-rule-">
 Gradient Descent Rule ☆
 
 &lt;a class="anchor" href="#gradient-descent-rule-">#&lt;/a>
 
&lt;/h2>
&lt;p>The gradient tells us the direction in which the loss increases fastest.
To reduce the loss, we move in the opposite direction.&lt;/p></description></item></channel></rss>