Activation function
Last updated
Last updated
the activation function is usually an abstraction representing the rate of action potential firing in the cell. In its simplest form, this function is binary-that is, either the neuron is firing or not.
Important: The most important meaning add activation function is by adding the activation funciton, we are adding non-linearity to the model.
For neural networks
Sigmoid Function:
Sigmoid non-linearity squashes real numbers to range between [0,1]
Sigmoids saturate(when is small, gradient is large) and kill gradients (when is large, gradient is small)
Sigmoid outputs are not zero-centered.
Tanh function:
It squashes a real-valued number to the range [-1, 1]
its activations saturate
its output is zero-centered.
ReLU function: or for ReLU6
It was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions.
Compared to tanh/sigmoid neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero.
ReLU units can be fragile during training and can die.
Leaky ReLU function:
if , ; else,
Reduce death during training for ReLU
Multi-class: softmax, see derivative
Binary: sigmoid
Regression: linear