Lesson 60 of 2116
Video Generation at the API Level
Behind the glossy UIs, video models expose REST APIs. Here's how to call Sora, Veo, and Runway programmatically and build production pipelines.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Video generation is inherently async
- 2Sora API
- 3Veo API
- 4Runway API
Concept cluster
Terms to connect while reading
Section 1
Video generation is inherently async
A single video generation call takes 30 seconds to 5 minutes. Every reputable API is asynchronous: POST a job, get a task ID, poll (or listen on a webhook) for completion, download the result. Build this pattern into your product from day one — latency retrofits are expensive.
Sora 2 API (OpenAI)
OpenAI released Sora 2 via API in late 2025. Note: OpenAI announced April 2026 it will discontinue the Sora web/app experiences, with API discontinuation in September 2026 — so Sora is a short-term option, not a long-term bet. Structure code behind a provider interface.
Sora 2 via OpenAI Python SDK. Simple polling loop.
from openai import OpenAI
import time
client = OpenAI()
job = client.videos.generate(
model="sora-2",
prompt="A chef ladles broth into a ramen bowl, steam rising, 35mm film look. Camera dollies in slowly.",
duration=10, # seconds
resolution="1080p",
aspect_ratio="16:9",
)
# Poll for completion
while True:
status = client.videos.retrieve(job.id)
if status.status == "completed":
video_url = status.video_url
break
elif status.status == "failed":
raise RuntimeError(status.error)
time.sleep(5)
# Download and store
import urllib.request
urllib.request.urlretrieve(video_url, "./ramen.mp4")Veo 3.1 API (Google Vertex AI)
Veo 3.1 via Vertex AI. Uses long-running operations pattern.
from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
aiplatform.init(project="my-project", location="us-central1")
client = aiplatform.gapic.PredictionServiceClient()
endpoint = client.endpoint_path(
project="my-project",
location="us-central1",
endpoint="publishers/google/models/veo-3.1-generate-001",
)
instance = json_format.ParseDict({
"prompt": "Cinematic drone shot over rice terraces in Bali at sunrise",
"duration_seconds": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"generate_audio": True,
}, Value())
# Long-running operation
operation = client.predict_long_running(
endpoint=endpoint,
instances=[instance],
)
result = operation.result(timeout=600)
video_bytes = result.predictions[0]["video_bytes"]Runway Gen-4.5 API
Runway Gen-4 image-to-video — the most reliable quality path.
import requests
import time
RUNWAY_KEY = "rwa_..."
headers = {"Authorization": f"Bearer {RUNWAY_KEY}", "Content-Type": "application/json"}
# Image-to-video (most consistent quality path)
resp = requests.post(
"https://api.dev.runwayml.com/v1/image_to_video",
headers=headers,
json={
"promptImage": "https://cdn.example.com/hero-frame.png",
"promptText": "Camera slowly pulls back to reveal the full landscape",
"model": "gen4_turbo",
"duration": 10,
"ratio": "1280:720",
},
)
task_id = resp.json()["id"]
while True:
status = requests.get(
f"https://api.dev.runwayml.com/v1/tasks/{task_id}", headers=headers
).json()
if status["status"] == "SUCCEEDED":
video_url = status["output"][0]
break
time.sleep(5)Production pipeline pattern
- 1Accept a job from a user. Store in a queue (Redis, SQS).
- 2Worker picks up the job, calls the video API, stores the task_id.
- 3Webhook handler or polling worker updates job status.
- 4On success, download video to object storage (S3/R2).
- 5Notify user (email, websocket, push).
- 6Periodic cleanup — providers keep result URLs for hours/days, not forever.
Provider ergonomics compared
Compare the options
| Provider | Submit pattern | Polling / webhook | Output format |
|---|---|---|---|
| Sora 2 (OpenAI) | client.videos.generate() — sync-style SDK. | client.videos.retrieve(id) polling. | Signed URL, MP4. |
| Veo 3.1 (Vertex AI) | client.predict_long_running() — LRO. | operation.result() with timeout. | Video bytes or GCS URI. |
| Runway Gen-4.5 | POST /image_to_video or /text_to_video. | GET /tasks/{id} polling. | Hosted URL (hours TTL). |
| Kling 3.0 | POST with signed auth; token-based. | Polling; webhook on enterprise. | Hosted URL + C2PA metadata. |
Provider abstraction layer
Abstract behind a provider interface. Video models consolidate fast.
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class VideoJob:
prompt: str
duration: int
resolution: str
ref_image_url: str | None = None
class VideoProvider(ABC):
@abstractmethod
async def submit(self, job: VideoJob) -> str: ... # returns task_id
@abstractmethod
async def status(self, task_id: str) -> dict: ...
@abstractmethod
async def download(self, task_id: str) -> bytes: ...
class SoraProvider(VideoProvider): ...
class VeoProvider(VideoProvider): ...
class RunwayProvider(VideoProvider): ...
class KlingProvider(VideoProvider): ...
def pick_provider(job: VideoJob, policy: str) -> VideoProvider:
if policy == "cheapest_4k":
return KlingProvider()
if policy == "best_physics":
return VeoProvider()
if policy == "best_character_consistency":
return RunwayProvider()
return VeoProvider() # sensible defaultQuality-control pipeline
- 1Generate 3-5 candidates per shot in parallel.
- 2Run each through a lightweight classifier (CLIP similarity to prompt; motion analysis; face-detection stability).
- 3Pick top candidate automatically; queue lower candidates for human review.
- 4On human rejection, feed feedback back as a refined prompt (closed-loop).
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Video Generation at the API Level”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 38 min
Open-Source vs. Closed Image Models
Flux Pro vs. Flux Dev. Midjourney vs. Stable Diffusion. The choice affects product architecture, cost, and what's possible. Here's the honest tradeoff.
Creators · 38 min
Audio Synthesis Pipelines
ElevenLabs, Stable Audio, and Suno expose APIs for voice, SFX, and music. Here's how to compose them into a production audio pipeline.
Creators · 75 min
Capstone: Build and Ship a Real Agent
Everything comes together. Design, code, test, secure, and ship a production-quality agent with open-source code you can fork today.
