Lesson 529 of 2116
Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends
There are too many open-weight models. A short, opinionated tour of the major families and what each is actually good at.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The open-weight skyline
- 2open weights
- 3model family
- 4fine-tune
Concept cluster
Terms to connect while reading
Section 1
The open-weight skyline
Open-weight models cluster into a few families with distinct personalities and strengths. Within each family, sizes range from 1B (laptop-friendly) to 70B+ (small cluster). Knowing the families saves you from drowning in Hugging Face.
Compare the options
| Family | Origin | Sweet spot | Reputation |
|---|---|---|---|
| Llama | Meta | General purpose, broad ecosystem | The default — well-supported |
| Mistral / Mixtral | Mistral AI (France) | Efficient, strong reasoning per parameter | European, MoE-friendly |
| Qwen | Alibaba | Coding, multilingual, long context | Often best-in-class at small sizes |
| DeepSeek | DeepSeek (China) | Reasoning and coding | Punches well above its size |
| Hermes / Nous | Community fine-tunes | Chattier, less refusal-y | Fine-tunes of base models |
| Phi | Microsoft Research | Tiny but capable | Great for embedded / edge |
| Gemma | Light, well-tuned | Polished, conservative |
How to pick — by job, not by hype
- 1Coding assistant on a laptop: Qwen-coder, Llama-code, or DeepSeek-coder at 7-8B
- 2General chat with broad world knowledge: Llama 3.x or 4.x at the largest size that fits
- 3Long-context document analysis: Qwen long variants — strong tokenization for non-English
- 4Small device or embedded: Phi or Gemma at 1-3B
- 5Fewer refusals (research-only, not production): a Hermes or Nous fine-tune
Reading a model card before you commit
- License: not all open-weight models are commercially usable
- Context length: advertised vs. effective often differ
- Tokenizer: matters for non-English performance and cost estimation
- Base vs Instruct: are you getting the chat-tuned version?
- Eval scores: take vendor numbers with salt; trust independent leaderboards more
Apply this
- Pick three models from three different families that fit your hardware
- Run the same five-prompt eval set on each
- Pick the one that wins on your real task — not the one with the highest leaderboard rank
Key terms in this lesson
The big idea: the right local model is the one that wins on your prompts, on your hardware. Family names are a starting filter, not an answer.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 18 min
Local Model Family: Llama
Llama is the reference ecosystem for many local-model tools, formats, fine-tunes, and community workflows.
Creators · 9 min
Switching Between OpenAI Models Inside ChatGPT: When Each Makes Sense
ChatGPT now ships several model variants under one UI. Knowing when to pick the flagship, the small one, or the reasoning one is a 30-second skill that pays back forever.
Creators · 9 min
What Hermes Is And How It Differs From Base Llama
Hermes is a Llama-derived family of open-weight models tuned by Nous Research for instruction-following, function calling, and structured output. The base model is the engine; Hermes is the body kit.
