Positional encoding is used so the model can carry some information about its position in the sentence
We want the model to treat words that appear close to each other as “close” and words that are distant as “distant”
The goal is to have a pattern that the model can learn
This is solved by Positional Embeddings
For the even positions this formula is used:
For the odd positions this formula is used:
This goes on for the size and it is concatenated onto the input embedding
Positional embeddings are calculated only ONCE, doesn’t depend on the context so they’re same for every sentence