Docs Link: https://docs.pytorch.org/tutorials/beginner/dist_overview.html

Torch distributed library includes collective of parallelism modules, communications layer, and infra for launching and debugging large training jobs

DistributedDataParallel (DDP) is a module in PyTorch that allows your model to parallelize across multiple machines making it perfect for large-scale deep learning applications

DDP uses communications from torch distributed package to synchronize gradients and buffers across processes; each process will have its own copy of the model but all work together to train the model as if it were on a single machine

DDP broadcasts model states from rank 0 processes to all other processes in DDP constructor (dont have to worry about DDP processes starting from different initial model parameter values)