Lesson 97 of 1570
Kimi K2 — long-context workflow
Moonshot's Kimi K2 specializes in long documents and retrieval-heavy workflows. Here is when it beats a generalist.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1A document-first chat model
- 2Kimi K2
- 3long context
- 4retrieval
Concept cluster
Terms to connect while reading
Section 1
A document-first chat model
Kimi K2 is tuned for uploads and long-document chat. Its attention mechanisms and instruction tuning emphasize consistent recall across hundreds of pages.
- Strong on multi-document synthesis
- Bilingual (Chinese + English) out of the box
- Competitive context window reported in the hundreds of thousands
- Agentic extensions for browser and file tools
Compare the options
| Task | Kimi K2 | Gemini 2.5 Pro | Grok 4.1 Fast |
|---|---|---|---|
| Multi-doc synthesis | Excellent | Excellent | Good |
| Chinese legal/finance | Excellent | Good | Good |
| Price | $$ | $$ | $ |
| Long-context QPS | Moderate | High | High |
Moonshot's API mirrors OpenAI; the 128k/longer variants carry the Kimi brand.
resp = kimi_client.chat.completions.create(
model="moonshot-v1-128k",
messages=[{"role": "user", "content": long_doc_prompt}],
)Workflow tip
Kimi's UI handles drag-and-drop of dozens of files at once, which is smoother than most Western chat UIs for heavy research. Even if you ship on a different model, Kimi can be the research scratchpad.
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Kimi K2 — long-context workflow”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 28 min
Gemini 2.5 Pro — how a 1M context actually helps
Everyone brags about million-token windows. Here is what you can actually do with one when you learn how Gemini 2.5 Pro handles long documents.
Builders · 28 min
Codestral Mamba — state-space architecture
Codestral Mamba ditches transformers for a state-space model. The result: linear-time long-context coding at a fraction of the attention cost.
Builders · 30 min
GPT-5.5 vs. Claude Opus 4.7 — which chatbot wins your day
Two frontier models, same subscription price, very different personalities. Pick by vibe, not by benchmark — here is how to figure out which one clicks for you.
