ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page
  • Introduction
  • The training process
  • KL divergence
  • Cost functions
  • Discriminator loss
  • Generator loss
  • Maximum likelihood game
  • Pix2Pix
  • Objective
  • Structure
  • Patch GAN
  • Evaluation metrics

Was this helpful?

  1. Computer Vision

GAN

PreviousFaceNetNextImbalance problem in object detection

Last updated 3 years ago

Was this helpful?

Introduction

  • The generator creates samples that are intended to come from the same distribution as the training data; The discriminator examines samples to determine whether they are real or fake.

  • The discriminator learns using traditional supervised learning techniques, dividing inputs into two classes (real or fake). The generator is trained to fool the discriminator.

  • Gradient Ascent on Discriminator

  • Gradient Descent on Generator

The training process

  • Adam is most used in GAN

KL divergence

In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.

Cost functions

Discriminator loss

This method quantifies how well the discriminator is able to distinguish real images from fakes. It compares the discriminator's predictions on real images to an array of 1s, and the discriminator's predictions on fake (generated) images to an array of 0s.

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

Generator loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

Maximum likelihood game

We might like to be able to do maximum likelihood learning with GANs, which would mean minimizing the KL divergence between the data and the model

  • GANs often choose to generate from very few modes; fewer than the limitation imposed by the model capacity. The reverse KL prefers to generate from as many modes of the data distribution as the model is able to; it does not prefer fewer modes in general. This suggests that the mode collapse is driven by a factor other than the choice of divergence.

Pix2Pix

Objective

  • Conditional GAN loss:

  • We also explore this option, using L1 distance rather than L2 as L1 encourages less blurring:

  • The final objective is:

  • Using Heuristic instead of minimax

Structure

  • Borrow structure of DCGAN:

  • Both generator and discriminator use modules of the form convolution-BatchNorm-ReLu;

  • Encoder-decoder Network:In such a network, the input is passed through a series of layers that progressively downsample, until a bottleneck layer, at which point the process is reversed. Such a network requires that all information flow pass through all the layers, including the bottleneck.

Patch GAN

  • Problems: GAN discriminator only model high-frequency structure, relying on an L1 term to force low-frequency correctness.

  • Solution: In order to model high-frequencies, it is sufficient to restrict our attention to the structure in local image patches.

Evaluation metrics

  • Run “real vs. fake” perceptual studies on Amazon Mechanical Turk (AMT);

  • Adopt the popular FCN-8s architecture for semantic segmentation;

zzz Input noise

training examples xxx are randomly sampled from the training set and used as input for the first player, the discriminator, represented by the function DDD.

G(z)G(z)G(z) a fake sample created by the generator

The training process consists of simultaneous SGD. On each step, two minibatches are sampled: a minibatch of xxx values from the dataset and a minibatch of zzz values drawn from the model’s prior over latent variables. Then two gradient steps are made simultaneously: one updating θ(D)\theta^{(D)}θ(D) to reduce J(D)J ^{(D)}J(D) and one updating θ(G)\theta ^{(G)}θ(G) o reduce J(G)J^{(G)}J(G).

For discrete probability distributions PPP and QQQ defined on the same probability space, the Kullback–Leibler divergence between PPP and QQQ is defined to be

DKL(P∥Q)=−∑x∈XP(x)log⁡(Q(x)P(x))D_{\text{KL}}(P\parallel Q)=-\sum _{x\in {\mathcal {X}}}P(x)\log \left({\frac {Q(x)}{P(x)}}\right)DKL​(P∥Q)=−x∈X∑​P(x)log(P(x)Q(x)​)

Goal: Minimize −log(D(x))−log(1−D(G(z)))- log(D(x)) - log(1-D(G(z)))−log(D(x))−log(1−D(G(z)))

Assume real samples' labels are always 1: yx=1y_x = 1yx​=1, fake samples' labels are always 0: yz=0y_z = 0yz​=0.

Lossreal=−yxlog(D(x))−(1−yx)log(D(x))=−log(D(x))Loss_{real} = -y_x log(D(x)) - (1-y_x) log(D(x)) = - log(D(x))Lossreal​=−yx​log(D(x))−(1−yx​)log(D(x))=−log(D(x))
Lossfake=−yzlog(D(G(z)))−(1−yz)log(1−D(G(z)))=−log(1−D(G(z)))Loss_{fake} = -y_z log(D(G(z))) - (1-y_z) log(1-D(G(z))) = - log(1-D(G(z)))Lossfake​=−yz​log(D(G(z)))−(1−yz​)log(1−D(G(z)))=−log(1−D(G(z)))
Loss=Lossreal+Lossfake=−log(D(x))−log(1−D(G(z)))Loss = Loss_{real} + Loss_{fake} =- log(D(x)) - log(1-D(G(z)))Loss=Lossreal​+Lossfake​=−log(D(x))−log(1−D(G(z)))

The generator's loss quantifies how well it was able to trick the discriminator. Intuitively, if the generator is performing well, the discriminator will classify the fake images as real (or 1). Here, we will compare the discriminators decisions on the generated images to an array of 1s. yz=1y_z = 1yz​=1

Lossfake=−yzlog(D(G(z)))−(1−yz)log(1−D(G(z)))=−log(D(G(z)))Loss_{fake} = -y_z log(D(G(z))) - (1-y_z) log(1-D(G(z))) = - log(D(G(z)))Lossfake​=−yz​log(D(G(z)))−(1−yz​)log(1−D(G(z)))=−log(D(G(z)))

The KL divergence is not symmetric; minimizing DKL(pdata∣∣pmodel)D_{KL}(p_{data} || p_{model})DKL​(pdata​∣∣pmodel​) is different from minimizing DKL(pmodel∣∣pdata)D_{KL}(p_{model} || p_{data})DKL​(pmodel​∣∣pdata​)

To give the generator a means to circumvent the bottleneck for information, we add skip connections, following the general shape of a U-Net. Specifically, we add skip connections between each layer i and layer n−in - in−i, where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer n−in - in−i.

Patch GAN only penalizes structure at the scale of patches. This discriminator tries to classify if each N×NN \times NN×N patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

image-20190721162310253
image-20190721170005167
image-20190721171310633
image-20190721170740583
image-20190721170045108
image-20190721170913618
image-20190721170940203