Self Attention

Simple RNN + Self Attention

c0=0h0=0c_0 = 0 \\ h_0 = 0 \\

Simple RNN: hi=tanh(A[xihi1]+b)h_i = tanh(A \cdot [ \begin{matrix} x_i \\ h_{i-1}\end{matrix} ] + b)

Simple RNN + Self Attention: hi=tanh(A[xici1]+b)h_i = tanh(A \cdot [ \begin{matrix} x_i \\ c_{i-1}\end{matrix} ] + b)

Calculate Weights: αi=align(hi,h2)\alpha_i=align(h_i, h_2)

Summary

  • With self-attention, RNN is less likely to forget.

  • Pay attention to the context relevant to the new input.

Reference

  • Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.

Last updated