Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends
There are too many open-weight models. A short, opinionated tour of the major families and what each is actually good at.
10 min · Reviewed 2026
The open-weight skyline
Open-weight models cluster into a few families with distinct personalities and strengths. Within each family, sizes range from 1B (laptop-friendly) to 70B+ (small cluster). Knowing the families saves you from drowning in Hugging Face.
Family
Origin
Sweet spot
Reputation
Llama
Meta
General purpose, broad ecosystem
The default — well-supported
Mistral / Mixtral
Mistral AI (France)
Efficient, strong reasoning per parameter
European, MoE-friendly
Qwen
Alibaba
Coding, multilingual, long context
Often best-in-class at small sizes
DeepSeek
DeepSeek (China)
Reasoning and coding
Punches well above its size
Hermes / Nous
Community fine-tunes
Chattier, less refusal-y
Fine-tunes of base models
Phi
Microsoft Research
Tiny but capable
Great for embedded / edge
Gemma
Google
Light, well-tuned
Polished, conservative
How to pick — by job, not by hype
Coding assistant on a laptop: Qwen-coder, Llama-code, or DeepSeek-coder at 7-8B
General chat with broad world knowledge: Llama 3.x or 4.x at the largest size that fits
Long-context document analysis: Qwen long variants — strong tokenization for non-English
Small device or embedded: Phi or Gemma at 1-3B
Fewer refusals (research-only, not production): a Hermes or Nous fine-tune
Reading a model card before you commit
License: not all open-weight models are commercially usable
Context length: advertised vs. effective often differ
Tokenizer: matters for non-English performance and cost estimation
Base vs Instruct: are you getting the chat-tuned version?
Eval scores: take vendor numbers with salt; trust independent leaderboards more
Apply this
Pick three models from three different families that fit your hardware
Run the same five-prompt eval set on each
Pick the one that wins on your real task — not the one with the highest leaderboard rank
The big idea: the right local model is the one that wins on your prompts, on your hardware. Family names are a starting filter, not an answer.
End-of-lesson check
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-local-choosing-a-model-creators
What is the main idea of "Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends"?
There are too many open-weight models. A short, opinionated tour of the major families and what each is actually good at.
Use AI as the final authority for the whole decision
Avoid checking the answer once it sounds polished
Focus only on speed instead of judgment
Which concept is most central to "Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends"?
model family
open weights
fine-tune
instruction tuning
Which use of AI fits this topic best?
Let the AI decide what matters without your review
Use the answer before checking whether it fits the situation
Coding assistant on a laptop: Qwen-coder, Llama-code, or DeepSeek-coder at 7-8B
Treat the AI output as automatically correct
What should a careful learner remember about "Base vs Instruct vs Chat"?
Use AI to draft or organize ideas about open weights, then verify before acting.
Skip the context so the tool can guess faster
Treat the output as private even after sharing it online
Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
Act immediately because the AI answer is written clearly
Use AI for drafting and comparison, but verify before publishing or relying on it.
Hide uncertainty so the final answer looks cleaner
Use private or sensitive details before checking permission
How should AI output about open weights be treated?
As proof that no other source is needed
As a replacement for context, consent, or expert review
As a draft or helper output that still needs human judgment and verification
As something that becomes correct when it sounds confident
Name one way to verify an AI answer about open weights.
Which action would help you apply "Choosing a Local Model: Llama, Mistral, Hermes, Qwen, DeepSeek, and Friends" responsibly?
Use the tool to avoid thinking through the tradeoff
Keep going even if the output conflicts with a trusted source
Treat the AI output as automatically correct
General chat with broad world knowledge: Llama 3.x or 4.x at the largest size that fits