Positional encoding is used so the model can carry some information about its position in the sentence

We want the model to treat words that appear close to each other as “close” and words that are distant as “distant”

The goal is to have a pattern that the model can learn

This is solved by Positional Embeddings

For the even positions this formula is used:

For the odd positions this formula is used:

This goes on for the size and it is concatenated onto the input embedding

Positional embeddings are calculated only ONCE, doesn’t depend on the context so they’re same for every sentence