Recurrent Neural Networks (RNNs) are neural networks designed for sequential data, where the order of inputs matters and the model must use information from earlier time steps to interpret later ones. Unlike a feedforward network, an RNN does not process each input in isolation. It carries a hidden state from one time step to the next, so the network can build a running summary of what it has seen so far.
Vanilla RNNs introduce the hidden-state idea, but they struggle on longer and more complex sequences because gradients can vanish across time. Deep recurrent models extend the RNN idea in two important ways:
make the recurrent architecture richer, for example by stacking multiple recurrent layers or using information from both directions,
use gates and memory cells to control what should be remembered, forgotten, updated, and exposed.
This is why practical recurrent modelling usually moves from a simple RNN to stacked RNNs, bidirectional RNNs, GRUs, or LSTMs.
Bayesian Learning is a probabilistic approach to machine learning.
Instead of only asking, “Which output should the model predict?”, Bayesian Learning asks:
Given the data we have observed, how likely is each hypothesis, class, or parameter value?
This makes Bayesian Learning useful when uncertainty matters.
It is especially important in classification, probabilistic modelling, generative models, and situations where we want to combine prior knowledge with observed data.
Optimizers are algorithms that update neural network parameters to reduce the loss function.
Deep networks usually have millions or billions of parameters, so there is usually no closed-form solution.
Instead, training uses iterative optimisation.
Key takeaway: An optimiser decides how the model moves through the loss landscape towards lower loss.
Goal of Optimization
Optimization Challenges in Deep Learning
Gradient Descent
Stochastic Gradient Descent
Minibatch Stochastic Gradient Descent
Momentum
Adagrad and Algorithm
RMSProp and Algorithm
Adadelta and Algorithm
Adam and Algorithm
Code Implementation and comparison of algorithms (webinar)
flowchart TD
A["Optimisers in DNN"] --> B["Gradient Descent Variants"]
A --> C["Momentum-based Optimiser"]
A --> D["Adaptive Methods"]
A --> E["Learning Rate Schedules"]
D --> D1["Parameter-specific learning rates"]
E --> E1["Learning rate changes during training"]
style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
style B fill:#EDE7F6,stroke:#7E57C2
style C fill:#C8E6C9,stroke:#43A047
style D fill:#FFF9C4,stroke:#FBC02D
style E fill:#F8BBD0,stroke:#D81B60