Self Attention is an application of Scaled Dot-Product Attention
Self Attention is when you apply Scaled Dot-Product Attention to a single sequence, using the same input for Q, K, and V
Self attention lets each work “look at” other words to decide what’s important using Scaled Dot-Product Attention So:
are learned weight matrices