PyTorch’s Autograd feature is what makes PyTorch flexible and fast for building machine learning projects. It allows for the rapid computation of multiple partial derivatives (gradients) over a complex computation.

Differentiation in Autograd

When making a PyTorch Tensors if you use the parameter requires_grad=True it signals to autograd that every operation must be tracked

We create a tensor Q from a and b

import torch
 
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Computational Graph

Autograd keeps a record of data (tensors) & all executed operations (along with resulting new tensors) in a directed acyclic graph (DAG)

In a DAG the leaves are the input tensors and the roots are the output sensors. By tracing this graph you can automatically compute the gradients using the chain rule

Forward Pass:

  • Run the requested operation to compute a resulting tensor
  • Maintain the operation’s gradient function in DAG

Backward Pass kicks off when .backward() is called on the DAG root. Autograd:

  • computes gradients from each .grad_fn
  • Accumulates them in the respective tensor’s .grad attribute
  • Using chain rule propagates all the way to leaf tensors

  • Arrows - Direction of forward pass
  • Nodes - Backward function of each operation in forward pass
  • Leaf nodes (blue) - represent leaf tensors a and b