Binary Cross-Entropy Loss
Last updated
Last updated
Also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every CNN output vector component is not affected by other component values. That’s why it is used for multi-label classification, were the insight of an element belonging to a certain class should not influence the decision for another class. It’s called Binary Cross-Entropy Loss because it sets up a binary classification problem between classes for every class in , as explained above. So when using this Loss, the formulation of Cross Entroypy Loss for binary problems is often used:
The loss can be expressed as:
Refer here for a detailed loss derivation.
Pytorch: BCEWithLogitsLoss
TensorFlow: sigmoid_cross_entropy.
This would be the pipeline for each one of the clases. We set independent binary classification problems (). Then we sum up the loss over the different binary problems: We sum up the gradients of every binary problem to backpropagate, and the losses to monitor the global loss. and are the score and the gorundtruth label for the class , which is also the class in . and are the score and the ground truth label of the class , which is not a "class" in our original problem with classes, but a class we create to set up the binary problem with . We can understand it as a background class.
Where means that the class is positive for this sample.
In this case, the activation function does not depend in scores of other classes in more than . So the gradient respect to the each score in will only depend on the loss given by its binary problem.
The gradient respect to the score can be written as:
Where is the sigmoid function. It can also be written as: