Self Attention

Simple RNN + Self Attention

c_0 = 0 \\ h_0 = 0 \\

Simple RNN: $h_i = tanh(A \cdot [ \begin{matrix} x_i \\ h_{i-1}\end{matrix} ] + b)$

Simple RNN + Self Attention: $h_i = tanh(A \cdot [ \begin{matrix} x_i \\ c_{i-1}\end{matrix} ] + b)$

Calculate Weights: $\alpha_i=align(h_i, h_2)$

Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.

Last updated 3 years ago

Was this helpful?