Lesson 400 of 2116
Sora: Video Generation Prompts And Their Limits
Video generation is the most expensive and least controllable AI media. Even when models like Sora are available, getting useful clips is a craft — and the platform reality keeps shifting.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Why video is the hardest modality
- 2text-to-video
- 3Sora
- 4shot grammar
Concept cluster
Terms to connect while reading
Section 1
Why video is the hardest modality
A still image is one frame. A 10-second clip is hundreds of frames that must agree on what each object looks like, where it is, and how it moves. That coherence problem is why text-to-video models lag image models by a generation, and why running them is so expensive that platforms quietly come and go.
Sora and its successors — the moving target
OpenAI's Sora was the highest-profile text-to-video demo of 2024-2025 and its production availability has shifted multiple times. Treat the brand as an ecosystem signal more than a stable SKU; assume access, length limits, and pricing will change. The skills below transfer to whichever video model is currently available — Runway, Veo, Kling, or the next OpenAI release.
Shot-grammar prompting
- 1Lead with the shot type — 'wide shot of', 'close-up on', 'overhead drone shot of'.
- 2Describe the subject, then the action, then the camera movement.
- 3Add lighting and time of day — 'late afternoon golden hour' beats 'sunny'.
- 4End with film/aesthetic reference — 'shot on 16mm film', '90s skate video aesthetic'.
- 5Keep clips under the model's recommended length; longer prompts that imply longer scenes degrade fast.
Where these models fail
Compare the options
| Failure mode | What you see | Mitigation |
|---|---|---|
| Limb glitching | Hands warp, legs add joints | Avoid close-up on hands; loose clothing helps |
| Text in the scene | Garbled signage, fake letters | Avoid prompts with on-screen text |
| Multi-character consistency | Faces morph across cuts | Generate each character separately and composite |
| Physics violations | Liquids float, gravity off | Keep scenes simple; prefer slow motion |
| Audio mismatch | Generated audio is generic | Replace audio in post |
Applied exercise
- 1Pick a 10-second moment you would otherwise shoot on phone — a product demo intro, an establishing shot.
- 2Write three prompt variations using shot-grammar structure.
- 3Generate all three on whatever video model you have access to.
- 4Note which prompt elements changed the output the most. Save your top patterns as a personal style guide.
Key terms in this lesson
The big idea: video generation is a real production tool today, but it is the most expensive and least stable AI medium. Build your craft on the prompts, not the brand.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Sora: Video Generation Prompts And Their Limits”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 11 min
AI Video Models: Sora, Veo, Runway, and What's Actually Usable
Video gen leapt forward but still has narrow sweet spots. Know them before you promise a client.
Builders · 7 min
Video models: Veo 3, Sora 2, Runway Gen-4
Three top video AIs — each has different strengths in length, realism, and control.
Creators · 40 min
ElevenLabs v3 — voice cloning use cases
ElevenLabs v3 clones a voice from seconds of audio. Here is what to build, what to avoid, and how to stay on the right side of consent.
