Blog Link: https://pub.sakana.ai/diffusionblocks/

Today’s neural networks are trained via end-to-end optimization, where all parameters are learned jointly. The entire network must be processed together as a result the memory required to train models grows with model size