Local Function Calling and Structured Output: Making Small Models Reliable

Section 1

The reliability problem

Schema-constrained output via the OpenAI-compatible API. The local engine enforces validity.

python

import json
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

schema = {
    "type": "object",
    "properties": {
        "intent": {"type": "string", "enum": ["book", "cancel", "reschedule"]},
        "confidence": {"type": "number", "minimum": 0, "maximum": 1},
    },
    "required": ["intent", "confidence"],
}

resp = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": user_text}],
    response_format={"type": "json_schema", "json_schema": {"name": "intent", "schema": schema}},
)
result = json.loads(resp.choices[0].message.content)

Compare the options

Approach	Setup cost	Failure mode	Throughput cost
Hope + parse	Trivial	Invalid JSON sometimes	None
JSON mode (no schema)	Low	Wrong shape, valid syntax	Negligible
JSON schema constrained	Low	Mostly correctness errors	Small
GBNF grammar	Medium	Can be too restrictive	Small

Key terms in this lesson

Local Function Calling and Structured Output: Making Small Models Reliable

The reliability problem

Three levels of structured output

Function calling on small models

Apply this

Curious about “Local Function Calling and Structured Output: Making Small Models Reliable”?

Keep going

Local Function Calling and Structured Output: Making Small Models Reliable

The reliability problem

Three levels of structured output

Function calling on small models

Apply this

Curious about “Local Function Calling and Structured Output: Making Small Models Reliable”?

Keep going