Standalone lesson.
Lesson 2110 of 2116
Image/Video/Audio Deep-Dive
Sora, Veo, Runway, ElevenLabs, Suno.
Generative AI beyond text is a full stack of specialized models. Know who made what, what they’re good at, and what they cost. All of this is moving fast — check current benchmarks before committing to a vendor.
Image generation
- DALL·E 3 — best instruction-following, in ChatGPT.
- Midjourney v7 — best aesthetic, worst controllability.
- Stable Diffusion 3.5 / SDXL — open, self-hostable.
- Flux Pro / Flux Schnell — best text rendering in images.
- Imagen 4 — Google’s flagship.
Video generation
- Sora 2 (OpenAI) — up to 60 seconds, consistent physics.
- Veo 3 (Google DeepMind) — audio-plus-video synthesis.
- Runway Gen-4 — filmmaker-focused, best camera controls.
- Kling, Luma Dream Machine — strong open alternatives.
Audio / voice / music
- ElevenLabs v3 — voice cloning and TTS.
- Suno v5 / Udio — song generation with vocals.
- Stable Audio Open — open-weight audio generation.
The provenance problem
As generation gets indistinguishable from reality, provenance — the verifiable trail of where something came from — becomes more important than the output itself. The C2PA standard (backed by Adobe, Microsoft, OpenAI, the BBC) embeds cryptographic metadata in AI outputs. Expect platforms to increasingly require or display this.
The ethics checkpoint
Voice cloning enables scams. Photorealistic image generation enables non-consensual deepfakes. Music generation disrupts real creators’ livelihoods. Every generative tool decision is also an ethical decision — whose labor are you displacing, whose likeness are you using, whose consent do you have?
Tutor
Curious about “Image/Video/Audio Deep-Dive”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 26 min
Making Music with Suno and Udio
Type a prompt, get a full song — vocals, drums, mix, even in Portuguese. Here's how Suno v5, Udio, and ElevenMusic work — and what they can't yet do.
Adults & Professionals · 40 min
Deepfake Detection: What Works, What Doesn't, and Why It Matters
AI-generated media has crossed the perceptual threshold where humans cannot reliably detect it. Detection tools help — but are in an arms race with generation.
Builders · 40 min
Laws Against Deepfakes
As of 2026, most US states have laws against malicious deepfakes — especially deepfake porn and political deepfakes..
