Ayush Garg

Search

SearchSearch

Recently Updated

  • Cosine Decay

    Mar 15, 2026

    • Stage-Aware Reward Modeling

      Mar 15, 2026

      • RA-BC Weights

        Mar 15, 2026

        • Flow Matching

          Mar 14, 2026

          Home

          ❯

          List of Notes

          ❯

          Self Attention

          Self Attention

          Mar 29, 2025, 1 min read

          Self Attention is an application of Scaled Dot-Product Attention

          Self Attention is when you apply Scaled Dot-Product Attention to a single sequence, using the same input for Q, K, and V

          Self attention lets each work “look at” other words to decide what’s important using Scaled Dot-Product Attention So:

          SelfAttention(X)=Attention(Q=XWQ,K=XWK,V=XWV)

          WQ,WK,WV are learned weight matrices

          Graph View

          Backlinks

          • Attention is All You Need

          Created by Ayush Garg using Quartz , © 2026

          • GitHub
          • Linkedin
          • Blog
          • Twitter