ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page
  • Simple RNN + Self Attention
  • Summary
  • Reference

Was this helpful?

  1. NLP

Self Attention

PreviousAttentionNextAttention without RNN

Last updated 3 years ago

Was this helpful?

Simple RNN + Self Attention

c0=0h0=0c_0 = 0 \\ h_0 = 0 \\c0​=0h0​=0

Simple RNN: hi=tanh(A⋅[xihi−1]+b)h_i = tanh(A \cdot [ \begin{matrix} x_i \\ h_{i-1}\end{matrix} ] + b)hi​=tanh(A⋅[xi​hi−1​​]+b)

Simple RNN + Self Attention: hi=tanh(A⋅[xici−1]+b)h_i = tanh(A \cdot [ \begin{matrix} x_i \\ c_{i-1}\end{matrix} ] + b)hi​=tanh(A⋅[xi​ci−1​​]+b)

Calculate Weights: αi=align(hi,h2)\alpha_i=align(h_i, h_2)αi​=align(hi​,h2​)

Summary

  • With self-attention, RNN is less likely to forget.

  • Pay attention to the context relevant to the new input.

Reference

  • Cheng, Dong, & Lapata. Long Short-Term Memory-Networks for Machine Reading. In EMNLP, 2016.

calculate h1
calculate h2
calculate c2
self attention focus