<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Attention on Arshad Siddiqui</title><link>https://arshadhs.github.io/tags/attention/</link><description>Recent content in Attention on Arshad Siddiqui</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://arshadhs.github.io/tags/attention/index.xml" rel="self" type="application/rss+xml"/><item><title>Attention Mechanism</title><link>https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/deep-learning/080-attention-mechanism/</guid><description>&lt;h1 id="attention-mechanism">
 Attention Mechanism
 
 &lt;a class="anchor" href="#attention-mechanism">#&lt;/a>
 
&lt;/h1>
&lt;p>Attention is a deep learning mechanism that allows a model to focus on the most relevant parts of an input sequence when producing an output.&lt;/p>
&lt;p>Instead of compressing the whole input into one fixed vector, attention computes a weighted combination of useful information.&lt;/p>
&lt;blockquote class="book-hint info">
&lt;p>&lt;strong>Key takeaway:&lt;/strong>&lt;br>
Attention answers a simple question:&lt;/p>

&lt;blockquote class='book-hint '>
 &lt;p>For the current prediction, which input tokens should the model focus on most?&lt;/p>
&lt;/blockquote>&lt;/blockquote>
&lt;ul>
&lt;li>Queries, Keys, and Values&lt;/li>
&lt;li>Attention Pooling by Similarity&lt;/li>
&lt;li>Attention Pooling via Nadaraya–Watson Regression&lt;/li>
&lt;li>Attention Scoring Functions&lt;/li>
&lt;li>Dot Product Attention&lt;/li>
&lt;li>Convenience Functions&lt;/li>
&lt;li>Scaled Dot Product Attention&lt;/li>
&lt;li>Additive Attention&lt;/li>
&lt;li>Bahdanau Attention Mechanism&lt;/li>
&lt;li>Multi-Head Attention&lt;/li>
&lt;li>Self-Attention&lt;/li>
&lt;li>Positional Encoding&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="why-attention-is-needed-">
 Why Attention Is Needed ☆
 
 &lt;a class="anchor" href="#why-attention-is-needed-">#&lt;/a>
 
&lt;/h2>
&lt;p>Traditional encoder-decoder RNN models compress the full input sequence into one context vector.&lt;/p></description></item></channel></rss>