GAN
Last updated
Last updated
The generator creates samples that are intended to come from the same distribution as the training data; The discriminator examines samples to determine whether they are real or fake.
The discriminator learns using traditional supervised learning techniques, dividing inputs into two classes (real or fake). The generator is trained to fool the discriminator.
Gradient Ascent on Discriminator
Gradient Descent on Generator
Adam is most used in GAN
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.
This method quantifies how well the discriminator is able to distinguish real images from fakes. It compares the discriminator's predictions on real images to an array of 1s, and the discriminator's predictions on fake (generated) images to an array of 0s.
We might like to be able to do maximum likelihood learning with GANs, which would mean minimizing the KL divergence between the data and the model
GANs often choose to generate from very few modes; fewer than the limitation imposed by the model capacity. The reverse KL prefers to generate from as many modes of the data distribution as the model is able to; it does not prefer fewer modes in general. This suggests that the mode collapse is driven by a factor other than the choice of divergence.
Conditional GAN loss:
We also explore this option, using L1 distance rather than L2 as L1 encourages less blurring:
The final objective is:
Using Heuristic instead of minimax
Borrow structure of DCGAN:
Both generator and discriminator use modules of the form convolution-BatchNorm-ReLu;
Encoder-decoder Network:In such a network, the input is passed through a series of layers that progressively downsample, until a bottleneck layer, at which point the process is reversed. Such a network requires that all information flow pass through all the layers, including the bottleneck.
Problems: GAN discriminator only model high-frequency structure, relying on an L1 term to force low-frequency correctness.
Solution: In order to model high-frequencies, it is sufficient to restrict our attention to the structure in local image patches.
Run “real vs. fake” perceptual studies on Amazon Mechanical Turk (AMT);
Adopt the popular FCN-8s architecture for semantic segmentation;
Input noise
training examples are randomly sampled from the training set and used as input for the first player, the discriminator, represented by the function .
a fake sample created by the generator
The training process consists of simultaneous SGD. On each step, two minibatches are sampled: a minibatch of values from the dataset and a minibatch of values drawn from the model’s prior over latent variables. Then two gradient steps are made simultaneously: one updating to reduce and one updating o reduce .
For discrete probability distributions and defined on the same probability space, the Kullback–Leibler divergence between and is defined to be
Goal: Minimize
Assume real samples' labels are always 1: , fake samples' labels are always 0: .
The generator's loss quantifies how well it was able to trick the discriminator. Intuitively, if the generator is performing well, the discriminator will classify the fake images as real (or 1). Here, we will compare the discriminators decisions on the generated images to an array of 1s.
The KL divergence is not symmetric; minimizing is different from minimizing
To give the generator a means to circumvent the bottleneck for information, we add skip connections, following the general shape of a U-Net. Specifically, we add skip connections between each layer i and layer , where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer .
Patch GAN only penalizes structure at the scale of patches. This discriminator tries to classify if each patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.