Transformer
From Single-Head to Multi-Head

Self-Attention Layer + Dense Layer

Stacked Self-Attention Layers

Transformer's Encoder

Transformer’s Decoder : One Block


Put things together: Transformer

Reference
Last updated