Activation function
the activation function is usually an abstraction representing the rate of action potential firing in the cell. In its simplest form, this function is binary-that is, either the neuron is firing or not.
Important: The most important meaning add activation function is by adding the activation funciton, we are adding non-linearity to the model.
For neural networks
Sigmoid Function:
Sigmoid non-linearity squashes real numbers to range between [0,1]
Sigmoids saturate(when is small, gradient is large) and kill gradients (when is large, gradient is small)
Sigmoid outputs are not zero-centered.
Tanh function:
It squashes a real-valued number to the range [-1, 1]
its activations saturate
its output is zero-centered.
ReLU function: or for ReLU6
It was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions.
Compared to tanh/sigmoid neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero.
ReLU units can be fragile during training and can die.
Leaky ReLU function:
if , ; else,
Reduce death during training for ReLU
Multi-class: softmax, see derivative
Binary: sigmoid
Regression: linear
Last updated