Simple RNN + Self Attention
c0=0h0=0 Simple RNN: hi=tanh(A⋅[xihi−1]+b)
Simple RNN + Self Attention: hi=tanh(A⋅[xici−1]+b)
Calculate Weights: αi=align(hi,h2)
Summary
With self-attention, RNN is less likely to forget.
Pay attention to the context relevant to the new input.
Reference
Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.