Paper Link: https://arxiv.org/pdf/1706.03762 Concepts § Transformer Encoder Transformer Decoder Scaled Dot-Product Attention Multi-Head Attention Self Attention Positional Encoding Layer Normalization