Recurrent Neural Networks are a type of Neural Networks that are really good at pattern recognition

Intuition Behind RNNs

Recurrent Neural Networks are basically a feedback loop of multiple feed forward neural networks.

They have a hidden state, which is like memory that the model can refer to. This model gets updated with every single pass through.

RNN’s share weights, since RNN’s use the same weights for every step it makes them efficient and suitable for sequences of varying lengths

Gradients

  • During training gradients are computed via Backpropagation to update weights in the network
  • In RNNs, this process is extended to Backpropagation Through Time (BPTT) where gradients are propagated backwards across many time steps
  • The weight updates depend on the gradients, which are derived from the chain rule of derivatives

Vanishing Gradients

  • If the weights have values such that the gradients (partial derivatives) are consistently smaller than 1, multiplying them across many layers or time steps causes the gradients to shrink exponentially.
  • Eventually, the gradients become so small that they effectively vanish (), preventing the network from learning long-term dependencies.

Exploding Gradients

  • Conversely, if the weights are consistently larger than 1, gradients grow exponentially as they are propagated backward.
  • This leads to very large weight updates, causing instability during training and divergence of the model.