Last updated 3 years ago
Simple RNN: hi=tanh(A⋅[xihi−1]+b)h_i = tanh(A \cdot [ \begin{matrix} x_i \\ h_{i-1}\end{matrix} ] + b)hi=tanh(A⋅[xihi−1]+b)
Simple RNN + Self Attention: hi=tanh(A⋅[xici−1]+b)h_i = tanh(A \cdot [ \begin{matrix} x_i \\ c_{i-1}\end{matrix} ] + b)hi=tanh(A⋅[xici−1]+b)
With self-attention, RNN is less likely to forget.
Pay attention to the context relevant to the new input.
Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.
Calculate Weights: αi=align(hi,h2)\alpha_i=align(h_i, h_2)αi=align(hi,h2)