Ollama: The Easy On-Ramp to Local Models

Section 1

What Ollama is and is not

Ollama treats local models like Docker treats containers: pull by name, run anywhere.

bash

# Install (macOS shown)
brew install ollama
ollama serve &        # background server on localhost:11434

# Pull and run a model
ollama run llama3.1:8b

# Manage models
ollama list           # what's installed
ollama rm <model>     # free disk
ollama show <model>   # parameters, quantization, context

# Use the API from any code
curl http://localhost:11434/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role":"user","content":"Hello"}]
  }'

Compare the options

Need	Ollama	Native llama.cpp
First model running	Minutes	An afternoon
Switching between models	One command	Manual file management
Custom prompt template	Modelfile	Command-line flags
Squeezing maximum performance	Decent defaults	Full control
Fits in a containerized deployment	Excellent	Workable

A Modelfile bakes in the system prompt and parameters so callers do not have to remember them.

text

FROM llama3.1:8b
SYSTEM """
You are a careful editor. Always ask for the source
before making any factual claim. Refuse to invent quotes.
"""
PARAMETER temperature 0.2
PARAMETER num_ctx 8192

Key terms in this lesson

Ollama: The Easy On-Ramp to Local Models

What Ollama is and is not

The five commands that cover most of life

Modelfiles: customizing without forking

Apply this

Curious about “Ollama: The Easy On-Ramp to Local Models”?

Keep going

Ollama: The Easy On-Ramp to Local Models

What Ollama is and is not

The five commands that cover most of life

Modelfiles: customizing without forking

Apply this

Curious about “Ollama: The Easy On-Ramp to Local Models”?

Keep going