GAN
The generator fakes, the discriminator detects — two networks competing against each other, ultimately teaching the machine to create something from nothing
✦ See It in Action: Generated Images Become Clearer During Training
Click "Start Training" and watch the generated image on the right gradually approach the real image on the left — this is the generator evolving through competition with the discriminator.
Training Loss Curve
Generator Loss decreasing = getting better at faking; Discriminator Loss stabilizing = competition approaching equilibrium
01 Plain English Explanation of GAN
A Story About Counterfeiting Money
Imagine two people:
At first, they know nothing — the "fake bills" they produce are obviously fake. But every time they get caught, they study what made their work look unrealistic and improve next time — getting more and more convincing.
They have a pile of real bills and specialize in authenticating fakes. As the counterfeiter gets better, the officer is forced to sharpen their skills too, otherwise they'll be fooled.
The two push each other to improve through competition. When training ends, the counterfeiter produces things that even the officer can't distinguish from real ones — that's GAN's goal.
GAN Doesn't Need "Answer Keys"
Regular neural networks need lots of labeled data (each image paired with a correct answer). GAN only needs a pile of real data — the discriminator itself serves as a dynamic "grading standard", requiring no manual annotation.
This enables GAN to learn to generate images, audio, text — almost anything.
The Three-Step Training Cycle
Show the discriminator real and fake images, let it learn to distinguish — like teaching the officer to recognize real bills
The generator's goal is to make the discriminator misjudge its output as "real" — like the counterfeiter studying how to fool the officer
Ideal end state: the generator produces things where the discriminator can only guess (50% accuracy) — the officer can't tell the difference anymore
What can GAN do?
Generate non-existent faces, landscapes, artwork from scratch (StyleGAN)
Transform photos into Van Gogh's style, or convert daytime to nighttime (CycleGAN)
Fill in missing parts from incomplete images, much more natural than filling with solid colors
Generate more training samples in scarce data scenarios like medical imaging
Challenges with GAN
The generator and discriminator need to grow in sync. If one side becomes too strong, the other can't learn — like an elementary school student playing chess against a world champion, the weaker side gets no useful feedback
The generator discovers that repeatedly producing one type of sample is enough to fool the discriminator, so it "takes the easy way out" and the generated content lacks diversity
Building GAN Step by Step
From data sampling to adversarial training, build it piece by piece.
Using the Box-Muller method to sample a Gaussian distribution — this is the target distribution we want G to learn to mimic.
G: noise → data; D: data → real/fake probability. Both have symmetric structures and are trained together.
Real data label 1, generated data label 0 — let D learn to distinguish real from fake.
Freeze D, let G generate samples that can fool D — G's gradients come from D's judgment.
02 Code
03 Academic Explanation
GAN (Generative Adversarial Network) is a generative model composed of a generator and a discriminator, which improve each other through adversarial training. The generator learns to generate fake data, and the discriminator learns to distinguish real from fake data.
Adversarial Training
The generator and discriminator compete against each other, like a police officer and a counterfeiter:
Objective Function
GAN's training objective is a minimax game:
The discriminator D wants to maximize this objective (output 1 for real images, 0 for fake); the generator G wants to minimize it (make fake images judged as real).
Training Process
Let the discriminator learn to distinguish real images from generated images
Let the generator learn to fool the discriminator, generating more realistic images
The generator's images are realistic enough that the discriminator cannot distinguish them
Nash Equilibrium and Global Optimum
Theoretically, GAN's optimal solution is when the generator perfectly reproduces the real data distribution p_data, at which point the discriminator can only output 0.5 for any input (unable to distinguish). This corresponds to Nash equilibrium in game theory — neither side can achieve a better result by unilaterally changing strategy.
Training Instability and Common Issues
When the discriminator becomes too strong, D(G(z)) → 0, and log(1-D(G(z))) has near-zero gradients, so the generator cannot receive effective learning signals
The generator discovers that a certain type of sample can stably fool the discriminator, so it only generates that type, losing diversity
G and D losses oscillate repeatedly, making it difficult to converge to the equilibrium point
Improvement Directions
Uses Wasserstein distance instead of JS divergence, fundamentally solving the vanishing gradient problem
Introduces convolutional architecture, significantly improving image generation quality and training stability
Style transfer without paired data, adding cycle consistency loss
Controllable style high-resolution face generation, separating high-level semantics from low-level textures
Summary
Generator vs Discriminator
Generate images from noise
Nash equilibrium point
Image generation, style transfer