Video Link: https://youtu.be/r305-aQTaU0?si=K0v034jqna1xrLzU
Tokens in diffusion models can attend to future AND past tokens
Since big chunks are generated at once each token can look at what happened in the future and have the ability to correct itself