Lesson 420 of 1596
Ollama: The Easy On-Ramp to Local Models
Ollama is the curl-and-go answer to running an LLM on your own machine. Here is what it actually does, the commands that matter, and the seams you will hit when you push it.
Creators · Model Families · ~6 min read
What Ollama is and is not
Ollama is a small, polished CLI and background service that downloads model weights, manages them on disk, and serves them over an OpenAI-compatible HTTP API. It bundles llama.cpp under the hood as the actual inference engine. What Ollama gives you on top is the developer experience — naming, versioning, a clean install, and a curated library of models you can pull by short name.
The five commands that cover most of life
Ollama treats local models like Docker treats containers: pull by name, run anywhere.
# Install (macOS shown) brew install ollama ollama serve & # background server on localhost:11434 # Pull and run a model ollama run llama3.1:8b # Manage models ollama list # what's installed ollama rm <model> # free disk ollama show <model> # parameters, quantization, context # Use the API from any code curl http://localhost:11434/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{ "model": "llama3.1:8b", "messages": [{"role":"user","content":"Hello"}] }'Compare the options
| Need | Ollama | Native llama.cpp |
|---|---|---|
| First model running | Minutes | An afternoon |
| Switching between models | One command | Manual file management |
| Custom prompt template | Modelfile | Command-line flags |
| Squeezing maximum performance | Decent defaults | Full control |
| Fits in a containerized deployment | Excellent | Workable |
Modelfiles: customizing without forking
A Modelfile bakes in the system prompt and parameters so callers do not have to remember them.
FROM llama3.1:8b SYSTEM """ You are a careful editor. Always ask for the source before making any factual claim. Refuse to invent quotes. """ PARAMETER temperature 0.2 PARAMETER num_ctx 8192Apply this
- 1Install Ollama and run two different models — one small, one larger
- 2Wire an existing OpenAI-SDK script to talk to Ollama by changing only the base URL
- 3Write a Modelfile that pins a system prompt and a temperature for one of those models
Key terms in this lesson
The big idea: Ollama is the path of least resistance into local LLMs. Start here, learn the seams, then graduate where you need more control.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Ollama: The Easy On-Ramp to Local Models”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 10 min
Running Hermes Locally With Ollama / LM Studio
Open-weight models like Hermes are useful only if you can actually run them. Ollama and LM Studio are the two paths most people take, and the trade-offs are real.
Creators · 18 min
Ollama Modelfiles: Turn a Base Model Into a Local Assistant
Ollama Modelfiles give students a simple way to package a local model with a system prompt, template, parameters, and named behavior.
Creators · 10 min
Building A Custom GPT For A Specific Workflow
A Custom GPT is just a packaged system prompt with files and tools attached. The hard part is scoping it tightly enough to be useful instead of generic.
