Ollama Basics: Running a Model Yourself

Ollama turns 'I want to run an LLM locally' into a one-line install and a two-word command. Here's the stack, the key commands, and the models worth pulling first.

32 min · Reviewed 2026

What Ollama is

Ollama is a command-line tool that downloads, manages, and serves local LLMs. Under the hood it uses llama.cpp (the fastest open-source runtime) and ships models in GGUF format. You get a localhost API that any app can call — including Claude Code, CrewAI, LangGraph, and OpenClaw.

Install and first model

# macOS: one-line install
brew install ollama

# Or download the app from ollama.com

# Start the background server
ollama serve &

# Pull and run Llama 4 (8B — works on most modern laptops)
ollama run llama4:8b

# You're now chatting with a local model. Ctrl-D to exit.From zero to local model in three commands.

The commands worth knowing

ollama list                    # what's installed
ollama pull qwen3.5:8b         # download (no run)
ollama run gemma4:4b           # download if needed, then chat
ollama rm llama4:70b           # free disk space
ollama ps                      # what's loaded in memory
ollama show qwen3.5:8b         # details: params, context, quant

# API access (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:8b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'Core Ollama commands and the OpenAI-compatible API.

Models to try (April 2026)

Model	Size	Best for
llama4:8b	~5 GB	General chat, balanced speed/quality.
qwen3.5:8b	~5 GB	Strong function calling + coding.
gemma4:12b	~8 GB	Google's frontier-at-size model, reasoning-tuned.
qwen3.5:32b	~20 GB	Near-frontier quality on a 32 GB Mac.
deepseek-coder:16b	~10 GB	Code-focused. Fast on a laptop GPU.
llama4:70b	~40 GB	Highest quality. Needs a Mac Studio or a real GPU box.

Using Ollama with agents

Because Ollama exposes an OpenAI-compatible API at localhost:11434, it drops into any agent framework that supports OpenAI. Point CrewAI, LangGraph, OpenClaw, or AutoGen at that URL and they'll happily run against your local model. As of March 2026, llama.cpp merged full MCP client support — meaning you can plug MCP servers (GitHub, Notion, Supabase) into a local Qwen or Llama too.

# OpenClaw talking to local Ollama — no cloud, no real credentials
# Pull once, then run against the local server at localhost:11434.

ollama pull qwen3.5:8b

openclaw config set backend local-ollama
openclaw config set model qwen3.5:8b

openclaw run "organize my Downloads folder"

# All traffic stays on localhost. Nothing leaves the machine.Point OpenClaw at a local Ollama model. No cloud provider involved; no API key required.

You now have the full local stack: Ollama for the model, OpenClaw or your framework of choice for the agent, MCP for the tools. Next is the builder capstone — design (not code) an agent for your own life.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-agentic-ollama-basics-builders

What is the core idea behind "Ollama Basics: Running a Model Yourself"?
1. Ollama turns 'I want to run an LLM locally' into a one-line install and a two-word command. Here's the stack, the key commands, and the models worth pulling first.
2. Secondary classifier: a cheap model scans agent output for suspicious actions be…
3. Drill the revocation process so it works when needed
4. AI policy
Which term best describes a foundational idea in "Ollama Basics: Running a Model Yourself"?
1. llama.cpp
2. Ollama
3. GGUF
4. quantization
A learner studying Ollama Basics: Running a Model Yourself would need to understand which concept?
1. Ollama
2. GGUF
3. llama.cpp
4. quantization
Which of these is directly relevant to Ollama Basics: Running a Model Yourself?
1. Ollama
2. llama.cpp
3. quantization
4. GGUF
What is the key insight about "LM Studio is the GUI cousin" in the context of Ollama Basics: Running a Model Yourself?
1. Prefer point-and-click? LM Studio offers the same thing with a polished UI and a slightly faster MLX backend on Apple Si…
2. Secondary classifier: a cheap model scans agent output for suspicious actions be…
3. Drill the revocation process so it works when needed
4. AI policy
What is the key insight about "Quantization matters" in the context of Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. A 70B model 'quantized to 4-bit' takes ~40 GB and runs. At full 16-bit it would be 140 GB.
3. Drill the revocation process so it works when needed
4. AI policy
What is the key warning about "Define the guardrails first" in the context of Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. Drill the revocation process so it works when needed
3. Before an agent runs, spell out what it's allowed to read, write, and delete.
4. AI policy
Which statement accurately describes an aspect of Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. Drill the revocation process so it works when needed
3. AI policy
4. Ollama is a command-line tool that downloads, manages, and serves local LLMs. Under the hood it uses llama.
What does working with Ollama Basics: Running a Model Yourself typically involve?
1. Because Ollama exposes an OpenAI-compatible API at localhost:11434, it drops into any agent framework that supports OpenAI.
2. Secondary classifier: a cheap model scans agent output for suspicious actions be…
3. Drill the revocation process so it works when needed
4. AI policy
Which of the following is true about Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. You now have the full local stack: Ollama for the model, OpenClaw or your framework of choice for the agent, MCP for the tools.
3. Drill the revocation process so it works when needed
4. AI policy
Which best describes the scope of "Ollama Basics: Running a Model Yourself"?
1. It is unrelated to agentic workflows
2. It applies only to the opposite beginner tier
3. It focuses on Ollama turns 'I want to run an LLM locally' into a one-line install and a two-word command. Here's t
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. Drill the revocation process so it works when needed
3. AI policy
4. Install and first model
Which section heading best belongs in a lesson about Ollama Basics: Running a Model Yourself?
1. The commands worth knowing
2. Secondary classifier: a cheap model scans agent output for suspicious actions be…
3. Drill the revocation process so it works when needed
4. AI policy
Which section heading best belongs in a lesson about Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. Models to try (April 2026)
3. Drill the revocation process so it works when needed
4. AI policy
Which section heading best belongs in a lesson about Ollama Basics: Running a Model Yourself?
1. Secondary classifier: a cheap model scans agent output for suspicious actions be…
2. Drill the revocation process so it works when needed
3. Using Ollama with agents
4. AI policy

← Back to interactive lesson

Tendril · Builders · Agentic AI