ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page
  • Network design
  • Loss function
  • Ordinal Regression
  • Network design
  • Loss function

Was this helpful?

  1. Loss

(Optional) CORAL Loss

Previous(Optional) Focal LossNextTwo Stage Object Detection

Last updated 3 years ago

Was this helpful?

Consistant Rank Ligist for Ordinal Regression

Network design

After the last fully-connected layer with num of classes as one, a 1D linear bias layer is introduced.

self.fc = nn.Linear(4096, 1, bias=False)
self.linear_1_bias = nn.Parameter(torch.zeros(num_classes-1).float())

Loss function

Let WWW denote the weight parameters of the neural network excluding the bias units of the final layer. The penultimate layer, whose output is denoted as g(xi,W)g(x_i,W)g(xi​,W), shares a single weight with all nodes in the final output layer. K−1K-1K−1 independent bias units are then added to g(xi,W)g(x_i, W)g(xi​,W) such that g(xi,W)+bkk=1K−1{g(x_i, W)+b_k}_{k=1}^{K-1}g(xi​,W)+bk​k=1K−1​ are the inputs to the crresponding binary classifiers in the final layer. Let s(z)=1/(1+exp(−z))s(z)=1/(1+exp(-z))s(z)=1/(1+exp(−z)) be the logistic sigmoid function. The predicted empirical probability for task k is defined as:

P^(yik=1)=s(g(xi,W)+bk)\hat{P}(y_i^k=1) = s(g(x_i, W) +b_k)P^(yik​=1)=s(g(xi​,W)+bk​)

For model training, we minimize the loss function:

L(W,b)=−∑i=1N∑k=1K−1λk[log(s(g(xi,W)+bk))yik+log(1−s(g(xi,W)+bk))(1−yik)]L(W,b) = - \sum_{i=1}^N \sum_{k=1}^{K-1} \lambda ^k [log(s(g(x_i,W) +b_k))y_i^k + log(1-s(g(x_i, W) + b_k))(1-y_i^k)]L(W,b)=−i=1∑N​k=1∑K−1​λk[log(s(g(xi​,W)+bk​))yik​+log(1−s(g(xi​,W)+bk​))(1−yik​)]

which is the weighted cross-entropy of K-1 binary classifiers. For rank prediction, the binary labels are obtained via:

fk(xi)=1P^(yik=1)>0.5f_k(x_i) = 1{ \hat{P}(y_i^k=1) > 0.5 }fk​(xi​)=1P^(yik​=1)>0.5

Example

Let's take a look at the labels, for 7 ranks:

  • Cross-Entropy, the one hot encoded label for class 3 is denoted as [0,0,1,0,0,0,0]T[0,0,1,0,0,0,0]^T[0,0,1,0,0,0,0]T,

  • CROAL-Loss, it's [1,1,1,0,0,0]T[1,1,1,0,0,0 ]^T[1,1,1,0,0,0]T

    levels = [[1] * label + [0] * (self.num_classes - 1 - label) for label in batch_y]

The logits for CORAL-loss looks like this [0.9,0.8,0.6,0.4,0.2,0.1]T[0.9, 0.8, 0.6, 0.4, 0.2, 0.1]^T[0.9,0.8,0.6,0.4,0.2,0.1]T, we find the last num >= 0.5, it's index 3 is our prediction.

During training, the loss for the current sample is calculated as

L=−∑k=1K−1[1,1,1,0,0,0]∗log([0.9,0.8,0.6,0.4,0.2,0.1]T)+(1−[1,1,1,0,0,0])∗log(1−[0.9,0.8,0.6,0.4,0.2,0.1]T)=−∑k=1K−1[1,1,1,0,0,0]∗log([0.9,0.8,0.6,0.4,0.2,0.1]T)+[0,0,0,1,1,1]∗log(1−[0.9,0.8,0.6,0.4,0.2,0.1]T)\begin{aligned} L = -\sum_{k=1}^{K-1} [1,1,1,0,0,0] * log( [0.9,0.8,0.6,0.4,0.2,0.1]^T ) \\ + (1 - [1,1,1,0,0,0]) * log (1- [0.9, 0.8, 0.6, 0.4, 0.2, 0.1]^T ) \\ = - \sum_{k=1}^{K-1} [1,1,1,0,0,0] * log( [0.9,0.8,0.6,0.4,0.2,0.1]^T ) \\ + [0,0,0,1,1,1] * log (1- [0.9, 0.8, 0.6, 0.4, 0.2, 0.1]^T ) \\ \end{aligned}L=−k=1∑K−1​[1,1,1,0,0,0]∗log([0.9,0.8,0.6,0.4,0.2,0.1]T)+(1−[1,1,1,0,0,0])∗log(1−[0.9,0.8,0.6,0.4,0.2,0.1]T)=−k=1∑K−1​[1,1,1,0,0,0]∗log([0.9,0.8,0.6,0.4,0.2,0.1]T)+[0,0,0,1,1,1]∗log(1−[0.9,0.8,0.6,0.4,0.2,0.1]T)​

Ordinal Regression

Network design

Last fc layer outputs (num_classes-1)*2 logits.

self.fc = nn.Linear(2048 * block.expansion, (self.num_classes-1)*2)

Final output is similar to CORAL-loss:

probas = F.softmax(logits, dim=2)[:, :, 1]
predict_levels = probas > 0.5
predicted_labels = torch.sum(predict_levels, dim=1)

Loss function

def cost_fn(logits, levels, imp):
    val = (-torch.sum((F.log_softmax(logits, dim=2)[:, :, 1]*levels
                      + F.log_softmax(logits, dim=2)[:, :, 0]*(1-levels))*imp, dim=1))
    return torch.mean(val)
Reference