Paper Link: https://arxiv.org/abs/2104.09864
Formula:
Key Components:
- : Fixed non-zero scalar
- : Embedding vector at position m, in 2D as
Paper Details
In the paper when talking about attention notice how they transpose the query values instead of the key value…
Advantage over others
Sinusoidal Positional Encoding are not preferred compared to Rotary Position Embeddings because there’s no clear pattern established as the position counter goes up
The output changes significantly (like magnitude)