Attention

Shortcoming of Seq2Seq

The final state is incapable of remembering a long sequence

Seq2Seq Model with Attention

  • Attention tremendously improves Seq2Seq model.

  • With attention, Seq2Seq model does not forget source input.

  • With attention, the decoder knows where to focus.

  • Downside: much more computation.

Simple RNN + Attention

Option1: (Used in original paper)

  1. Linear maps:

  2. Inner product:

  3. Normalization

Calculate the next state

Time complexity

Weights Visualization

Summary

  • Standard Seq2Seq model: the decoder looks at only its current state.

  • Attention: decoder additionally looks at all the states of the encoder.

  • Attention: decoder knows where to focus.

  • Downside: higher time complexity.

References:

  1. Bahdanau, Cho, & Bengio. Neural machine translation by jointly learning to align and translate.

    In ICLR, 2015.

Last updated