Lesson 488 of 1596
Local Model Family: Gemma
Gemma is Google DeepMind open-model family, useful for local and single-accelerator experiments when students want polished small models.
Creators · Model Families · ~24 min read
Why Gemma matters locally
Gemma is a useful local-model lesson because it makes one trade-off visible: small local assistants, education demos, research baselines, and comparing Google-style open models to Qwen, Mistral, and Llama. The point is not to crown a permanent winner. The point is to learn how to match a model family to hardware, task, license, and risk.
Compare the options
| Question | What students should inspect | Why it matters |
|---|---|---|
| Can it run here? | Size, quantization, RAM, VRAM, runtime support | A model that barely loads is not a usable assistant |
| Is it good for this task? | small local assistants, education demos, research baselines, and comparing Google-style open models to Qwen, Mistral, and Llama | Family reputation only matters when the workload matches |
| Can we legally use it? | License, use policy, model card, redistribution terms | Open weights do not all mean the same rights |
| How do we know? | A small eval set with speed, quality, and failure notes | Local models should be chosen with evidence, not vibes |
Current source signal
Build the small version
Create a Gemma model card reader: students extract size, license terms, intended uses, unsafe uses, and runtime requirements.
- 1Pick one exact model file or runtime tag from the current model card.
- 2Run three short prompts: one easy, one task-specific, and one likely failure case.
- 3Record load time, response speed, memory pressure, answer quality, and one surprising failure.
- 4Write a one-paragraph recommendation: use it, do not use it, or use it only for a narrow job.
A classroom-safe design sketch for this local-model family.
model_card_notes: family: Gemma size: check_current_card quantized_available: yes_or_no intended_use: classroom_demo_or_app license_terms: summarize_before_use safety_notes: copy_key_limits runtime: ollama_lmstudio_or_transformersKey terms in this lesson
The big idea: remember model card reader. Local model work is product design under constraints, not just downloading the model with the loudest leaderboard score.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Local Model Family: Gemma”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 35 min
llama.cpp: The Engine Underneath Almost Everything
Ollama, LM Studio, and most local-model apps are wrappers around llama.cpp. Knowing what it actually does — and how to drop down to it — pays off when defaults are not enough.
Creators · 11 min
Small Language Models on Device: Phi, Gemma, Llama 3.2 in Production
When a 3B-7B model on-device wins over an API call to a frontier model.
Creators · 11 min
AI On-Device: Phi, Gemma, and When Tiny Models Make Sense
4B-parameter models run on your laptop and phone. They're not GPT-5 — but they're surprisingly useful.
