ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page

Was this helpful?

  1. Loss

Binary Cross-Entropy Loss

PreviousCross-Entropy LossNextCategorical Cross-Entropy Loss

Last updated 3 years ago

Was this helpful?

Also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every CNN output vector component is not affected by other component values. That’s why it is used for multi-label classification, were the insight of an element belonging to a certain class should not influence the decision for another class. It’s called Binary Cross-Entropy Loss because it sets up a binary classification problem between C′=2C'=2C′=2 classes for every class in CCC, as explained above. So when using this Loss, the formulation of Cross Entroypy Loss for binary problems is often used:

CE=−∑i=1C′=2tilog(f(si))=−t1log(f(s1))−(1−t1)log(1−f(s1))CE = -\sum_{i=1}^{C'=2}t_{i} log (f(s_{i})) = -t_{1} log(f(s_{1})) - (1 - t_{1}) log(1 - f(s_{1}))CE=−i=1∑C′=2​ti​log(f(si​))=−t1​log(f(s1​))−(1−t1​)log(1−f(s1​))

This would be the pipeline for each one of the CCC clases. We set CCC independent binary classification problems (C′=2C'=2C′=2). Then we sum up the loss over the different binary problems: We sum up the gradients of every binary problem to backpropagate, and the losses to monitor the global loss. s1s_1s1​ and t1t_1t1​ are the score and the gorundtruth label for the class C1C1C1, which is also the class CiC_iCi​ in CCC. s2=1−s1s_2=1-s_1s2​=1−s1​ and t2=1−t1t_2=1-t_1t2​=1−t1​ are the score and the ground truth label of the class C2C_2C2​, which is not a "class" in our original problem with CCC classes, but a class we create to set up the binary problem with C1=CiC_1=C_iC1​=Ci​. We can understand it as a background class.

The loss can be expressed as:

CE={−log(f(s1))ift1=1−log(1−f(s1))ift1=0CE = \left\{\begin{matrix} & - log(f(s_{1})) & & if & t_{1} = 1 \\ & - log(1 - f(s_{1})) & & if & t_{1} = 0 \end{matrix}\right.CE={​−log(f(s1​))−log(1−f(s1​))​​ifif​t1​=1t1​=0​

Where t1=1t_1=1t1​=1 means that the class C1=CiC_1=C_iC1​=Ci​ is positive for this sample.

In this case, the activation function does not depend in scores of other classes in CCC more than C1=CiC_1=C_iC1​=Ci​. So the gradient respect to the each score sis_isi​ in sss will only depend on the loss given by its binary problem.

The gradient respect to the score si=s1s_i=s_1si​=s1​ can be written as:

∂∂si(CE(f(si))=t1(f(s1)−1)+(1−t1)f(s1)\frac{\partial}{\partial s_{i}} \left ( CE(f(s_{i})\right) = t_{1} (f(s_{1}) - 1) + (1 - t_{1}) f(s_{1})∂si​∂​(CE(f(si​))=t1​(f(s1​)−1)+(1−t1​)f(s1​)

Where f()f()f() is the sigmoid function. It can also be written as:

∂∂si(CE(f(si))={f(si)−1ifti=1f(si)ifti=0}\frac{\partial}{\partial s_{i}} \left ( CE(f(s_{i})\right) = \begin{Bmatrix} f(s_{i}) - 1 && if & t_{i} = 1\\ f(s_{i}) && if & t_{i} = 0 \end{Bmatrix}∂si​∂​(CE(f(si​))={f(si​)−1f(si​)​​ifif​ti​=1ti​=0​}
import numpy as np
from sklearn.metrics import log_loss

import tensorflow as tf


def binary_cross_entropy(X, y):
    m = y.shape[0]
    y = y.reshape((m))
    # apply sigmod 1/(1+e^-x)
    fX = 1 / (1 + np.exp(-X))

    # -Y * log(fX) - (1-Y) * (1-log(fX))
    a = -Y * np.log(fX) - (1 - Y) * np.log(1 - fX)
    ce = np.sum(-Y * np.log(fX) - (1 - Y) * np.log(1 - fX)) / m
    return ce


X = np.array([[9.7], [0]])  # (N, 1)
Y = np.array([[0],[1]])  # (N, 1)

print(binary_cross_entropy(X, Y))

Refer for a detailed loss derivation.

Caffe:

Pytorch:

TensorFlow: .

here
Sigmoid Cross-Entropy Loss Layer
BCEWithLogitsLoss
sigmoid_cross_entropy