ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page

Was this helpful?

  1. Loss

Cross-Entropy Loss

PreviousHinge LossNextBinary Cross-Entropy Loss

Last updated 3 years ago

Was this helpful?

Cross-Entropy loss

The Cross-Entropy Loss is actually the only loss we are discussing here. The other losses names written in the title are other names or variations of it. The CE Loss is defined as:

CE=−∑iCtilog(si)CE = -\sum_{i}^{C}t_{i} log (s_{i})CE=−i∑C​ti​log(si​)

Where tit_iti​ and sis_isi​ are the ground truth and the CNN score for each classiclass_iclassi​ in CCC. As usually an activation function (Sigmoid / Softmax) is applied to the scores before the CE Loss computation, we write f(si)f(s_i)f(si​) to refer to the activations.

In a binary classification problem, where C′=2C'=2C′=2, the Cross Entropy Loss can be defined also as :

CE=−∑i=1C′=2tilog(si)=−t1log(s1)−(1−t1)log(1−s1)CE = -\sum_{i=1}^{C'=2}t_{i} log (s_{i}) = -t_{1} log(s_{1}) - (1 - t_{1}) log(1 - s_{1})CE=−i=1∑C′=2​ti​log(si​)=−t1​log(s1​)−(1−t1​)log(1−s1​)

Where it’s assumed that there are two classes: C1C_1C1​ and C2C_2C2​. t1t_1t1​ [0,1] and s1s_1s1​ are the ground truth and the score for C1C_1C1​, and t2=1−t1t_2=1-t_1t2​=1−t1​ and s2=1−s1s_2=1-s_1s2​=1−s1​ are the ground truth and the score for C2C_2C2​. That is the case when we split a Multi-Label classification problem in CCC binary classification problems. See next Binary Cross-Entropy Loss section for more details.

Logistic Loss and Multinomial Logistic Loss are other names for Cross-Entropy loss.

def softmax(X):
    exps = np.exp(X)
    return exps / np.sum(exps)


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce


predictions = np.array([[0.25, 0.25, 0.25, 0.25], [0.01, 0.01, 0.01, 0.97]]) # (N, num_classes)
targets = np.array([[1, 0, 0, 0], [0, 0, 0, 1]]) # (N, num_classes)

cross_entropy(predictions, targets)
# 0.7083767843022996

log_loss(targets, predictions)
# 0.7083767843022996

log_loss(targets, predictions) == cross_entropy(predictions, targets)
# True

The layers of Caffe, Pytorch and Tensorflow than use a Cross-Entropy loss without an embedded activation function are:

Caffe: . Is limited to multi-class classification (does not support multiple labels).

Pytorch: . Is limited to binary classification (between two classes).

TensorFlow: .

Refreence
[discussion]
[Discussion]
Multinomial Logistic Loss Layer
BCELoss
log_loss