Docs Link: https://docs.pytorch.org/tutorials/beginner/dist_overview.html
Torch distributed library includes collective of parallelism modules, communications layer, and infra for launching and debugging large training jobs
DistributedDataParallel (DDP) is a module in PyTorch that allows your model to parallelize across multiple machines making it perfect for large-scale deep learning applications
DDP uses communications from torch distributed package to synchronize gradients and buffers across processes; each process will have its own copy of the model but all work together to train the model as if it were on a single machine
DDP broadcasts model states from rank 0 processes to all other processes in DDP constructor (dont have to worry about DDP processes starting from different initial model parameter values)