Video Link: https://youtu.be/r305-aQTaU0?si=K0v034jqna1xrLzU

Tokens in diffusion models can attend to future AND past tokens

Since big chunks are generated at once each token can look at what happened in the future and have the ability to correct itself