GAN - ML Easy Learning

✦ See It in Action: Generated Images Become Clearer During Training

Click "Start Training" and watch the generated image on the right gradually approach the real image on the left — this is the generator evolving through competition with the discriminator.

Real Image

Generated Image (Evolving During Training)

Epoch: 0

Training Loss Curve

Generator Loss decreasing = getting better at faking; Discriminator Loss stabilizing = competition approaching equilibrium

01 Plain English Explanation of GAN

A Story About Counterfeiting Money

Imagine two people:

Counterfeiter (Generator)

At first, they know nothing — the "fake bills" they produce are obviously fake. But every time they get caught, they study what made their work look unrealistic and improve next time — getting more and more convincing.

Police Officer (Discriminator)

They have a pile of real bills and specialize in authenticating fakes. As the counterfeiter gets better, the officer is forced to sharpen their skills too, otherwise they'll be fooled.

The two push each other to improve through competition. When training ends, the counterfeiter produces things that even the officer can't distinguish from real ones — that's GAN's goal.

GAN Doesn't Need "Answer Keys"

Regular neural networks need lots of labeled data (each image paired with a correct answer). GAN only needs a pile of real data — the discriminator itself serves as a dynamic "grading standard", requiring no manual annotation.

This enables GAN to learn to generate images, audio, text — almost anything.

The Three-Step Training Cycle

Freeze generator, train discriminator

Show the discriminator real and fake images, let it learn to distinguish — like teaching the officer to recognize real bills

Freeze discriminator, train generator

The generator's goal is to make the discriminator misjudge its output as "real" — like the counterfeiter studying how to fool the officer

Alternate repeatedly until equilibrium

Ideal end state: the generator produces things where the discriminator can only guess (50% accuracy) — the officer can't tell the difference anymore

What can GAN do?

Image Generation

Generate non-existent faces, landscapes, artwork from scratch (StyleGAN)

Style Transfer

Transform photos into Van Gogh's style, or convert daytime to nighttime (CycleGAN)

Image Inpainting

Fill in missing parts from incomplete images, much more natural than filling with solid colors

Data Augmentation

Generate more training samples in scarce data scenarios like medical imaging

Challenges with GAN

Training Instability

The generator and discriminator need to grow in sync. If one side becomes too strong, the other can't learn — like an elementary school student playing chess against a world champion, the weaker side gets no useful feedback

Mode Collapse

The generator discovers that repeatedly producing one type of sample is enough to fool the discriminator, so it "takes the easy way out" and the generated content lacks diversity

Building GAN Step by Step

From data sampling to adversarial training, build it piece by piece.

Step 1 Real Data Distribution

Using the Box-Muller method to sample a Gaussian distribution — this is the target distribution we want G to learn to mimic.

Step 2 Build Generator and Discriminator

G: noise → data; D: data → real/fake probability. Both have symmetric structures and are trained together.

Step 3 Train Discriminator

Real data label 1, generated data label 0 — let D learn to distinguish real from fake.

Step 4 Train Generator

Freeze D, let G generate samples that can fool D — G's gradients come from D's judgment.

02 Code

03 Academic Explanation

GAN (Generative Adversarial Network) is a generative model composed of a generator and a discriminator, which improve each other through adversarial training. The generator learns to generate fake data, and the discriminator learns to distinguish real from fake data.

Adversarial Training

The generator and discriminator compete against each other, like a police officer and a counterfeiter:

Objective Function

GAN's training objective is a minimax game:

min_G max_D E[log D(x)] + E[log(1 − D(G(z)))]

The discriminator D wants to maximize this objective (output 1 for real images, 0 for fake); the generator G wants to minimize it (make fake images judged as real).

Training Process

Freeze generator, train discriminator

Let the discriminator learn to distinguish real images from generated images

Freeze discriminator, train generator

Let the generator learn to fool the discriminator, generating more realistic images

Alternate training until equilibrium

The generator's images are realistic enough that the discriminator cannot distinguish them

Nash Equilibrium and Global Optimum

Theoretically, GAN's optimal solution is when the generator perfectly reproduces the real data distribution p_data, at which point the discriminator can only output 0.5 for any input (unable to distinguish). This corresponds to Nash equilibrium in game theory — neither side can achieve a better result by unilaterally changing strategy.

Training Instability and Common Issues

Vanishing Gradients

When the discriminator becomes too strong, D(G(z)) → 0, and log(1-D(G(z))) has near-zero gradients, so the generator cannot receive effective learning signals

Mode Collapse

The generator discovers that a certain type of sample can stably fool the discriminator, so it only generates that type, losing diversity

Training Oscillation

G and D losses oscillate repeatedly, making it difficult to converge to the equilibrium point

Improvement Directions

WGAN

Uses Wasserstein distance instead of JS divergence, fundamentally solving the vanishing gradient problem

DCGAN

Introduces convolutional architecture, significantly improving image generation quality and training stability

CycleGAN

Style transfer without paired data, adding cycle consistency loss

StyleGAN

Controllable style high-resolution face generation, separating high-level semantics from low-level textures

Summary

Adversarial

Generator vs Discriminator

Generation

Generate images from noise

Equilibrium

Nash equilibrium point

Applications

Image generation, style transfer