Lesson 52 of 1570
Video AI — Sora, Veo, Runway, Kling
Text-to-video became practical in 2025 and cinematic in 2026. Here's the state of the art and how to choose.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The video moment
- 2video generation
- 3Sora 2
- 4Veo 3
Concept cluster
Terms to connect while reading
Section 1
The video moment
Video generation went from 'jittery 4-second clips' in early 2024 to 'broadcast-grade 4K with synchronized dialogue and music' in early 2026. Four models lead. They're genuinely useful for pre-visualization, ads, b-roll, and short films — though full Hollywood-grade filmmaking is still human + AI, not AI alone.
Compare the options
| Model | Best for | Max length / resolution | Audio? |
|---|---|---|---|
| OpenAI Sora 2 | Cinematic physics, multi-subject scenes. | ~20s, 1080p (upscalable). | Synced audio + dialogue. |
| Google Veo 3.1 | Photorealism, audio quality, character dialog. | ~60s, 1080p. | Best-in-class synced audio. |
| Runway Gen-4.5 | Character consistency across scenes; pro editing. | ~10s per shot; stitch in Runway. | Synced audio. |
| Kuaishou Kling 3.0 | Native 4K / 60fps, longest clips (5 min), human motion. | 5 min, 4K. | Synced audio. |
| Luma Dream Machine / Pika 2 | Fast iteration, social-media clips, affordable. | ~10s, 1080p. | Some models, newer. |
Prompting video
Video prompts have two extra slots beyond image prompts: motion and camera.
A video prompt that specifies subject, setting, action, camera move, style, lighting, and duration.
A chef in a crowded Tokyo ramen shop gently ladles broth into a bowl. Steam rises. Camera slowly dollies in on her hands, then tilts up to her focused face. Shot on 35mm, shallow depth of field, warm practical lighting from paper lanterns. 8 seconds.Image-to-video and ref-image workflows
- Image-to-video: generate a still in Midjourney/Flux you love, then animate it in Runway or Kling.
- Reference character: upload 3-5 images of a character to keep them consistent across shots (Runway Gen-4, Kling).
- Keyframe: specify first and last frame; the model fills the motion between (Luma, Runway).
What breaks
- Complex hand interactions (holding, typing) still glitch.
- Long narratives — characters drift over 10+ seconds without explicit reference.
- Physics in unusual scenarios (zero-g, underwater with many objects).
- Fine text on signs/screens — still garbled most of the time.
Ethical considerations
Video deepfakes of real people are a serious concern. All major providers (OpenAI, Google, Runway, Kuaishou) refuse to generate named public figures without their explicit opt-in, and they watermark outputs (C2PA + SynthID for Google). If you're shipping a product, disclose AI origin and respect the TAKE IT DOWN Act (US) and EU AI Act labeling.
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Video AI — Sora, Veo, Runway, Kling”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 40 min
Builder Capstone: Ship a Short Creative Piece
Your first end-to-end AI-assisted creative project. Plan it, make it, and reflect on what surprised you. Small scope, real output.
Builders · 26 min
DALL-E vs. Midjourney vs. Flux
Five image models, five personalities. Here's when each one is the right pick — in 2026, with current strengths, costs, and quirks.
Builders · 30 min
The Craft of Image Prompting
Great image prompters aren't typing harder — they're using a mental framework. Subject, setting, style, composition, lighting, mood. Here's the system.
