Self Attention is an application of Scaled Dot-Product Attention

Self Attention is when you apply Scaled Dot-Product Attention to a single sequence, using the same input for Q, K, and V

Self attention lets each work “look at” other words to decide what’s important using Scaled Dot-Product Attention So:

are learned weight matrices