Loading lesson…
Ollama turns 'I want to run an LLM locally' into a one-line install and a two-word command. Here's the stack, the key commands, and the models worth pulling first.
Ollama is a command-line tool that downloads, manages, and serves local LLMs. Under the hood it uses llama.cpp (the fastest open-source runtime) and ships models in GGUF format. You get a localhost API that any app can call — including Claude Code, CrewAI, LangGraph, and OpenClaw.
# macOS: one-line install brew install ollama # Or download the app from ollama.com # Start the background server ollama serve & # Pull and run Llama 4 (8B — works on most modern laptops) ollama run llama4:8b # You're now chatting with a local model. Ctrl-D to exit.From zero to local model in three commands.ollama list # what's installed ollama pull qwen3.5:8b # download (no run) ollama run gemma4:4b # download if needed, then chat ollama rm llama4:70b # free disk space ollama ps # what's loaded in memory ollama show qwen3.5:8b # details: params, context, quant # API access (OpenAI-compatible) curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.5:8b", "messages": [{"role": "user", "content": "Hello"}] }'Core Ollama commands and the OpenAI-compatible API.| Model | Size | Best for |
|---|---|---|
| llama4:8b | ~5 GB | General chat, balanced speed/quality. |
| qwen3.5:8b | ~5 GB | Strong function calling + coding. |
| gemma4:12b | ~8 GB | Google's frontier-at-size model, reasoning-tuned. |
| qwen3.5:32b | ~20 GB | Near-frontier quality on a 32 GB Mac. |
| deepseek-coder:16b | ~10 GB | Code-focused. Fast on a laptop GPU. |
| llama4:70b | ~40 GB | Highest quality. Needs a Mac Studio or a real GPU box. |
Because Ollama exposes an OpenAI-compatible API at localhost:11434, it drops into any agent framework that supports OpenAI. Point CrewAI, LangGraph, OpenClaw, or AutoGen at that URL and they'll happily run against your local model. As of March 2026, llama.cpp merged full MCP client support — meaning you can plug MCP servers (GitHub, Notion, Supabase) into a local Qwen or Llama too.
# OpenClaw talking to local Ollama — no cloud, no real credentials # Pull once, then run against the local server at localhost:11434. ollama pull qwen3.5:8b openclaw config set backend local-ollama openclaw config set model qwen3.5:8b openclaw run "organize my Downloads folder" # All traffic stays on localhost. Nothing leaves the machine.Point OpenClaw at a local Ollama model. No cloud provider involved; no API key required.You now have the full local stack: Ollama for the model, OpenClaw or your framework of choice for the agent, MCP for the tools. Next is the builder capstone — design (not code) an agent for your own life.
6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-agentic-ollama-basics-builders
What is the main idea of "Ollama Basics: Running a Model Yourself"?
Which concept is most central to "Ollama Basics: Running a Model Yourself"?
What should a careful learner remember about "LM Studio is the GUI cousin"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about Ollama be treated?
Name one way to verify an AI answer about Ollama.