Deep Learning

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks #

Vanilla RNNs introduce the hidden-state idea, but they struggle on longer and more complex sequences because gradients can vanish across time. Deep recurrent models extend the RNN idea in two important ways:

  1. make the recurrent architecture richer, for example by stacking multiple recurrent layers or using information from both directions,
  2. use gates and memory cells to control what should be remembered, forgotten, updated, and exposed.

This is why practical recurrent modelling usually moves from a simple RNN to stacked RNNs, bidirectional RNNs, GRUs, or LSTMs.