Categorical Cross-Entropy Loss
Where Sp is the CNN score for the positive class.
Defined the loss, now we’ll have to compute its gradient respect to the output neurons of the CNN in order to backpropagate it through the net and optimize the defined loss function tuning the net parameters. So we need to compute the gradient of CE Loss respect each CNN class score in ss. The loss terms coming from the negative classes are zero. However, the loss gradient respect those negative classes is not cancelled, since the Softmax of the positive class also depends on the negative classes scores.
After some calculus, the derivative respect to the positive class is:
And the derivative respect to the other (negative) classes is:
Caffe: SoftmaxWithLoss Layer. Is limited to multi-class classification.
Pytorch: CrossEntropyLoss. Is limited to multi-class classification.
TensorFlow: softmax_cross_entropy. Is limited to multi-class classification.
In this Facebook work they claim that, despite being counter-intuitive, Categorical Cross-Entropy loss, or Softmax loss worked better than Binary Cross-Entropy loss in their multi-label classification problem.
→ Skip this part if you are not interested in Facebook or me using Softmax Loss for multi-label classification, which is not standard.
The gradient has different expressions for positive and negative classes. For positive classes:
For negative classes:
This expressions are easily inferable from the single-label gradient expressions.
As Caffe Softmax with Loss layer nor Multinomial Logistic Loss Layer accept multi-label targets, I implemented my own PyCaffe Softmax loss layer, following the specifications of the Facebook paper. Caffe python layers let’s us easily customize the operations done in the forward and backward passes of the layer:
Forward pass: Loss computation
For full code, take a look at here.
Backward pass: Gradients computation
The Caffe Python layer of this Softmax loss supporting a multi-label setup with real numbers labels is available here
Last updated