Lesson 47 of 1570
Ollama Basics: Running a Model Yourself
Ollama turns 'I want to run an LLM locally' into a one-line install and a two-word command. Here's the stack, the key commands, and the models worth pulling first.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1What Ollama is
- 2Ollama
- 3local models
- 4GGUF
Concept cluster
Terms to connect while reading
Section 1
What Ollama is
Ollama is a command-line tool that downloads, manages, and serves local LLMs. Under the hood it uses llama.cpp (the fastest open-source runtime) and ships models in GGUF format. You get a localhost API that any app can call — including Claude Code, CrewAI, LangGraph, and OpenClaw.
Install and first model
From zero to local model in three commands.
# macOS: one-line install
brew install ollama
# Or download the app from ollama.com
# Start the background server
ollama serve &
# Pull and run Llama 4 (8B — works on most modern laptops)
ollama run llama4:8b
# You're now chatting with a local model. Ctrl-D to exit.The commands worth knowing
Core Ollama commands and the OpenAI-compatible API.
ollama list # what's installed
ollama pull qwen3.5:8b # download (no run)
ollama run gemma4:4b # download if needed, then chat
ollama rm llama4:70b # free disk space
ollama ps # what's loaded in memory
ollama show qwen3.5:8b # details: params, context, quant
# API access (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5:8b",
"messages": [{"role": "user", "content": "Hello"}]
}'Models to try (April 2026)
Compare the options
| Model | Size | Best for |
|---|---|---|
| llama4:8b | ~5 GB | General chat, balanced speed/quality. |
| qwen3.5:8b | ~5 GB | Strong function calling + coding. |
| gemma4:12b | ~8 GB | Google's frontier-at-size model, reasoning-tuned. |
| qwen3.5:32b | ~20 GB | Near-frontier quality on a 32 GB Mac. |
| deepseek-coder:16b | ~10 GB | Code-focused. Fast on a laptop GPU. |
| llama4:70b | ~40 GB | Highest quality. Needs a Mac Studio or a real GPU box. |
Using Ollama with agents
Because Ollama exposes an OpenAI-compatible API at localhost:11434, it drops into any agent framework that supports OpenAI. Point CrewAI, LangGraph, OpenClaw, or AutoGen at that URL and they'll happily run against your local model. As of March 2026, llama.cpp merged full MCP client support — meaning you can plug MCP servers (GitHub, Notion, Supabase) into a local Qwen or Llama too.
Point OpenClaw at a local Ollama model. No cloud provider involved; no API key required.
# OpenClaw talking to local Ollama — no cloud, no real credentials
# Pull once, then run against the local server at localhost:11434.
ollama pull qwen3.5:8b
openclaw config set backend local-ollama
openclaw config set model qwen3.5:8b
openclaw run "organize my Downloads folder"
# All traffic stays on localhost. Nothing leaves the machine.You now have the full local stack: Ollama for the model, OpenClaw or your framework of choice for the agent, MCP for the tools. Next is the builder capstone — design (not code) an agent for your own life.
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Ollama Basics: Running a Model Yourself”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 32 min
Tools an Agent Might Have: Filesystem, Browser, Code
Agents are only as useful as their tools. Tour the big three — filesystem, browser, code execution — plus the emerging MCP ecosystem, with examples of what each unlocks.
Builders · 30 min
The Four Ingredients: Goal, Tools, Loop, Stop
Every agent — fancy or simple, local or cloud — boils down to four parts. Learn the recipe and you can read any agent system like a menu.
Builders · 32 min
Your First Agent: A Walkthrough of What It Does
Follow a real agent run step by step — from prompt to result — and see exactly what happens inside. No code yet, just the anatomy of a successful task.
