(Optional) Focal Loss
Last updated
Last updated
A much larger set of candidate object locations is regularly sampled across an image (~100k locations), which densely cover spatial positions, scales and aspect ratios.
The training procedure is still dominated by easily classified background examples. It is typically addressed via bootstrapping or hard example mining. But they are not efficient enough.
To address the class imbalance, one method is to add a weighting factor for class 1 and for class -1. may be set by inverse class frequency or treated as a hyperparameter to set by cross validation.
The loss function is reshaped to down-weight easy examples and thus focus training on hard negatives. A modulating factor is added to the cross entropy loss where is tested from in the experiment.
There are two properties of the FL:
When an example is misclassified and is small, the modulating factor is near 1 and the loss is unaffected. As , the factor goes to 0 and the loss for well-classified examples is down-weighted.
The focusing parameter smoothly adjusts the rate at which easy examples are down-weighted. When , FL is equivalent to CE. When is increased, the effect of the modulating factor is likewise increased. ( works best in experiment.)
The above form is used in experiment in practice where α is added into the equation, which yields slightly improved accuracy over the one without α. And using sigmoid activation function for computing p resulting in greater numerical stability.
: Focus more on hard examples.
: Offset class imbalance of number of examples.