How Diffusion Models Actually Work

An AI that paints starts with pure noise and removes it, one step at a time, until a picture appears. Here's the surprisingly beautiful math behind it.

28 min · Reviewed 2026

Noise in, picture out

Almost every modern image AI — DALL-E 3, Midjourney, Stable Diffusion, Flux, Imagen — is a diffusion model. The core idea is strange and brilliant: instead of 'drawing,' the AI subtracts. It starts with a canvas of pure random noise (like TV static) and removes noise step by step until a picture emerges. Your prompt steers which picture emerges.

How a diffusion model gets trained (the forward process)

Take a real picture from the training data — say, a photo of a dog.
Add a little random noise. The dog is still obvious.
Add more noise. Now it's blurry.
Keep adding noise over many steps, until the picture is pure static — indistinguishable from random noise.
Train a neural network to reverse ONE step: given a slightly noisy image, predict what was removed to get there.

Do that with billions of images and their captions. The network learns, deeply, what noise looks like AT EVERY LEVEL and how to peel it back toward a coherent image — guided by the caption.

Generating a new image (the reverse process)

Start with pure noise and your text prompt.
The model predicts: 'if this is a partially-noisy picture of <your prompt>, what noise should I remove?'
Subtract that predicted noise. The picture is now slightly less noisy.
Repeat, typically 20–50 times, each step getting the image closer to a real picture matching the prompt.
After the final step, the noise is gone and a finished picture remains.

Latent diffusion (the trick that made it fast)

Doing diffusion on full 1024x1024 pixel images is slow. Stable Diffusion's 2022 breakthrough was to work in latent space — a compressed representation (roughly 64x64 with many channels) learned by a separate autoencoder. Diffusion happens in latent space, which is 50x smaller, then the decoder turns the final latent into a full image. Flux and Stable Diffusion 3.5 use the same approach.

Diffusion vs. autoregressive image models

Diffusion (SD, Flux, Midjourney)	Autoregressive (GPT-4o image, some experimental)
Generate whole image at once, refine.	Generate pixel or patch, then next, like text tokens.
Fast, parallel, high quality.	Slower, but natural fit with LLMs.
Dominant approach in 2026.	Growing as multimodal LLMs improve.
ControlNet, LoRA, IP-Adapter work here.	Different adapter ecosystem.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creative-diffusion-explained-builders

What is the main idea of "How Diffusion Models Actually Work"?
1. An AI that paints starts with pure noise and removes it, one step at a time, until a picture appears.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "How Diffusion Models Actually Work"?
1. denoising
2. diffusion
3. latent space
4. forward/reverse process
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Take a real picture from the training data — say, a photo of a dog.
4. Use the first answer without checking it
What should a careful learner remember about "The magic detail: classifier-free guidance"?
1. Use AI to draft or organize ideas about diffusion, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use the AI answer as a draft, then check it against a reliable source.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about diffusion be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about diffusion.
Which action would help you apply "How Diffusion Models Actually Work" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Use the first answer without checking it
4. Add a little random noise. The dog is still obvious.

← Back to interactive lesson

Tendril · Builders · Creative AI

How Diffusion Models Actually Work

An AI that paints starts with pure noise and removes it, one step at a time, until a picture appears. Here's the surprisingly beautiful math behind it.

28 min · Reviewed 2026

Noise in, picture out

How a diffusion model gets trained (the forward process)

Take a real picture from the training data — say, a photo of a dog.
Add a little random noise. The dog is still obvious.
Add more noise. Now it's blurry.
Keep adding noise over many steps, until the picture is pure static — indistinguishable from random noise.
Train a neural network to reverse ONE step: given a slightly noisy image, predict what was removed to get there.

Do that with billions of images and their captions. The network learns, deeply, what noise looks like AT EVERY LEVEL and how to peel it back toward a coherent image — guided by the caption.

Generating a new image (the reverse process)

Start with pure noise and your text prompt.
The model predicts: 'if this is a partially-noisy picture of <your prompt>, what noise should I remove?'
Subtract that predicted noise. The picture is now slightly less noisy.
Repeat, typically 20–50 times, each step getting the image closer to a real picture matching the prompt.
After the final step, the noise is gone and a finished picture remains.

Latent diffusion (the trick that made it fast)

Diffusion vs. autoregressive image models

Diffusion (SD, Flux, Midjourney)	Autoregressive (GPT-4o image, some experimental)
Generate whole image at once, refine.	Generate pixel or patch, then next, like text tokens.
Fast, parallel, high quality.	Slower, but natural fit with LLMs.
Dominant approach in 2026.	Growing as multimodal LLMs improve.
ControlNet, LoRA, IP-Adapter work here.	Different adapter ecosystem.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creative-diffusion-explained-builders

What is the main idea of "How Diffusion Models Actually Work"?
1. An AI that paints starts with pure noise and removes it, one step at a time, until a picture appears.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "How Diffusion Models Actually Work"?
1. denoising
2. diffusion
3. latent space
4. forward/reverse process
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Take a real picture from the training data — say, a photo of a dog.
4. Use the first answer without checking it
What should a careful learner remember about "The magic detail: classifier-free guidance"?
1. Use AI to draft or organize ideas about diffusion, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use the AI answer as a draft, then check it against a reliable source.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about diffusion be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about diffusion.
Which action would help you apply "How Diffusion Models Actually Work" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Use the first answer without checking it
4. Add a little random noise. The dog is still obvious.

← Back to interactive lesson