ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page

Was this helpful?

  1. Neural Networks

Activation function

PreviousBasicsNextDifferent Types of Convolution

Last updated 3 years ago

Was this helpful?

the activation function is usually an abstraction representing the rate of action potential firing in the cell. In its simplest form, this function is binary-that is, either the neuron is firing or not.

Important: The most important meaning add activation function is by adding the activation funciton, we are adding non-linearity to the model.

For neural networks

  • Sigmoid Function: f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1​

    • Sigmoid non-linearity squashes real numbers to range between [0,1]

    • Sigmoids saturate(when xxx is small, gradient is large) and kill gradients (when xxx is large, gradient is small)

    • Sigmoid outputs are not zero-centered.

  • Tanh function: f(x)=ex−e−xex+e−xf(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}f(x)=ex+e−xex−e−x​

    • It squashes a real-valued number to the range [-1, 1]

    • its activations saturate

    • its output is zero-centered.

  • ReLU function: f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x) or f(x)=min(6,max(0,x))f(x)=min(6, max(0,x))f(x)=min(6,max(0,x)) for ReLU6

    • It was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions.

    • Compared to tanh/sigmoid neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero.

    • ReLU units can be fragile during training and can die.

  • Leaky ReLU function:

    if x>=0x >= 0x>=0 , f(x)=xf(x) = xf(x)=x; else, f(x)=axf(x) = axf(x)=ax

    • Reduce death during training for ReLU

  • Multi-class: softmax, see

    • po,c=eyk∑c=1Meycp_{o,c} = \frac{e^{y_{k}}}{\sum_{c=1}^M e^{y_{c}}}po,c​=∑c=1M​eyc​eyk​​
  • Binary: sigmoid

  • Regression: linear

derivative
img
img
img