Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends

There are too many open-weight models. A short, opinionated tour of the major families and what each is actually good at.

10 min · Reviewed 2026

The open-weight skyline

Open-weight models cluster into a few families with distinct personalities and strengths. Within each family, sizes range from 1B (laptop-friendly) to 70B+ (small cluster). Knowing the families saves you from drowning in Hugging Face.

Family	Origin	Sweet spot	Reputation
Llama	Meta	General purpose, broad ecosystem	The default — well-supported
Mistral / Mixtral	Mistral AI (France)	Efficient, strong reasoning per parameter	European, MoE-friendly
Qwen	Alibaba	Coding, multilingual, long context	Often best-in-class at small sizes
DeepSeek	DeepSeek (China)	Reasoning and coding	Punches well above its size
Hermes / Nous	Community fine-tunes	Chattier, less refusal-y	Fine-tunes of base models
Phi	Microsoft Research	Tiny but capable	Great for embedded / edge
Gemma	Google	Light, well-tuned	Polished, conservative

How to pick — by job, not by hype

Coding assistant on a laptop: Qwen-coder, Llama-code, or DeepSeek-coder at 7-8B
General chat with broad world knowledge: Llama 3.x or 4.x at the largest size that fits
Long-context document analysis: Qwen long variants — strong tokenization for non-English
Small device or embedded: Phi or Gemma at 1-3B
Fewer refusals (research-only, not production): a Hermes or Nous fine-tune

Reading a model card before you commit

License: not all open-weight models are commercially usable
Context length: advertised vs. effective often differ
Tokenizer: matters for non-English performance and cost estimation
Base vs Instruct: are you getting the chat-tuned version?
Eval scores: take vendor numbers with salt; trust independent leaderboards more

Apply this

Pick three models from three different families that fit your hardware
Run the same five-prompt eval set on each
Pick the one that wins on your real task — not the one with the highest leaderboard rank

The big idea: the right local model is the one that wins on your prompts, on your hardware. Family names are a starting filter, not an answer.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-local-choosing-a-model-creators

What is the core idea behind "Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends"?
1. There are too many open-weight models. A short, opinionated tour of the major families and what each is actually good at.
2. settings
3. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
4. You prefer point-and-click to terminal for any reason — that is a valid reason
Which term best describes a foundational idea in "Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends"?
1. instruct tune
2. open weights
3. MoE
4. fine-tune
A learner studying Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends would need to understand which concept?
1. open weights
2. MoE
3. instruct tune
4. fine-tune
Which of these is directly relevant to Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. open weights
2. instruct tune
3. fine-tune
4. MoE
Which of the following is a key point about Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. Coding assistant on a laptop: Qwen-coder, Llama-code, or DeepSeek-coder at 7-8B
2. General chat with broad world knowledge: Llama 3.x or 4.x at the largest size that fits
3. Long-context document analysis: Qwen long variants — strong tokenization for non-English
4. Small device or embedded: Phi or Gemma at 1-3B
Which of these does NOT belong in a discussion of Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. settings
2. Long-context document analysis: Qwen long variants — strong tokenization for non-English
3. Coding assistant on a laptop: Qwen-coder, Llama-code, or DeepSeek-coder at 7-8B
4. General chat with broad world knowledge: Llama 3.x or 4.x at the largest size that fits
Which statement is accurate regarding Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. Context length: advertised vs. effective often differ
2. Tokenizer: matters for non-English performance and cost estimation
3. License: not all open-weight models are commercially usable
4. Base vs Instruct: are you getting the chat-tuned version?
Which of these does NOT belong in a discussion of Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. Tokenizer: matters for non-English performance and cost estimation
2. License: not all open-weight models are commercially usable
3. Context length: advertised vs. effective often differ
4. settings
What is the key insight about "Base vs Instruct vs Chat" in the context of Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. Open-weight models almost always ship in multiple flavors: base (unaligned, predicts text), instruct (follows instructio…
2. settings
3. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
4. You prefer point-and-click to terminal for any reason — that is a valid reason
What is the key insight about "Fine-tunes are a wild west" in the context of Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. settings
2. Anyone can post a 'better' fine-tune on Hugging Face. Quality varies dramatically.
3. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
4. You prefer point-and-click to terminal for any reason — that is a valid reason
What is the key insight about "From the community" in the context of Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. settings
2. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
3. Recent r/LocalLLaMA model-recommendation threads converge on a few names: Qwen variants for coding and long context, Dee…
4. You prefer point-and-click to terminal for any reason — that is a valid reason
Which statement accurately describes an aspect of Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. settings
2. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
3. You prefer point-and-click to terminal for any reason — that is a valid reason
4. Open-weight models cluster into a few families with distinct personalities and strengths.
What does working with Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends typically involve?
1. The big idea: the right local model is the one that wins on your prompts, on your hardware.
2. settings
3. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
4. You prefer point-and-click to terminal for any reason — that is a valid reason
Which best describes the scope of "Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends"?
1. It is unrelated to model-families workflows
2. It focuses on There are too many open-weight models. A short, opinionated tour of the major families and what each
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends?
1. settings
2. On a recent M-series Mac, a small local model can beat a slow cloud call to time…
3. How to pick — by job, not by hype
4. You prefer point-and-click to terminal for any reason — that is a valid reason

← Back to interactive lesson

Tendril · Creators · Model Families