Lesson 524 of 2116
Ollama: The Easy On-Ramp to Local Models
Ollama is the curl-and-go answer to running an LLM on your own machine. Here is what it actually does, the commands that matter, and the seams you will hit when you push it.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1What Ollama is and is not
- 2Ollama
- 3model runner
- 4OpenAI-compatible API
Concept cluster
Terms to connect while reading
Section 1
What Ollama is and is not
Ollama is a small, polished CLI and background service that downloads model weights, manages them on disk, and serves them over an OpenAI-compatible HTTP API. It bundles llama.cpp under the hood as the actual inference engine. What Ollama gives you on top is the developer experience — naming, versioning, a clean install, and a curated library of models you can pull by short name.
The five commands that cover most of life
Ollama treats local models like Docker treats containers: pull by name, run anywhere.
# Install (macOS shown)
brew install ollama
ollama serve & # background server on localhost:11434
# Pull and run a model
ollama run llama3.1:8b
# Manage models
ollama list # what's installed
ollama rm <model> # free disk
ollama show <model> # parameters, quantization, context
# Use the API from any code
curl http://localhost:11434/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "llama3.1:8b",
"messages": [{"role":"user","content":"Hello"}]
}'Compare the options
| Need | Ollama | Native llama.cpp |
|---|---|---|
| First model running | Minutes | An afternoon |
| Switching between models | One command | Manual file management |
| Custom prompt template | Modelfile | Command-line flags |
| Squeezing maximum performance | Decent defaults | Full control |
| Fits in a containerized deployment | Excellent | Workable |
Modelfiles: customizing without forking
A Modelfile bakes in the system prompt and parameters so callers do not have to remember them.
FROM llama3.1:8b
SYSTEM """
You are a careful editor. Always ask for the source
before making any factual claim. Refuse to invent quotes.
"""
PARAMETER temperature 0.2
PARAMETER num_ctx 8192Apply this
- 1Install Ollama and run two different models — one small, one larger
- 2Wire an existing OpenAI-SDK script to talk to Ollama by changing only the base URL
- 3Write a Modelfile that pins a system prompt and a temperature for one of those models
Key terms in this lesson
The big idea: Ollama is the path of least resistance into local LLMs. Start here, learn the seams, then graduate where you need more control.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Ollama: The Easy On-Ramp to Local Models”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 10 min
Running Hermes Locally With Ollama / LM Studio
Open-weight models like Hermes are useful only if you can actually run them. Ollama and LM Studio are the two paths most people take, and the trade-offs are real.
Creators · 18 min
Ollama Modelfiles: Turn a Base Model Into a Local Assistant
Ollama Modelfiles give students a simple way to package a local model with a system prompt, template, parameters, and named behavior.
Creators · 9 min
ChatGPT For Everyday Work: Plus vs Pro vs Team vs Enterprise
Picking the right ChatGPT tier is mostly about who else sees your data and how much heavy reasoning you do. The price differences are obvious; the policy differences are not.
