Loading lesson…
Tool use and JSON output are not just frontier-cloud features. Modern Ollama and llama.cpp support both — with sharper constraints that pay off in reliability.
When a frontier cloud model returns JSON, it almost always parses. When a 7B local model returns JSON, it sometimes adds a stray comma, drops a closing brace, or wraps the whole thing in apologetic prose. The model is not 'wrong'; it is undertrained for that exact format. The fix is constrained decoding — telling the inference engine to only allow tokens that keep the output valid.
import json
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
schema = {
"type": "object",
"properties": {
"intent": {"type": "string", "enum": ["book", "cancel", "reschedule"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1},
},
"required": ["intent", "confidence"],
}
resp = client.chat.completions.create(
model="llama3.1:8b",
messages=[{"role": "user", "content": user_text}],
response_format={"type": "json_schema", "json_schema": {"name": "intent", "schema": schema}},
)
result = json.loads(resp.choices[0].message.content)Schema-constrained output via the OpenAI-compatible API. The local engine enforces validity.| Approach | Setup cost | Failure mode | Throughput cost |
|---|---|---|---|
| Hope + parse | Trivial | Invalid JSON sometimes | None |
| JSON mode (no schema) | Low | Wrong shape, valid syntax | Negligible |
| JSON schema constrained | Low | Mostly correctness errors | Small |
| GBNF grammar | Medium | Can be too restrictive | Small |
Modern Ollama supports OpenAI-style tool calls — same shape as the cloud APIs. Many open-weight models (Llama 3.1+, Qwen 2.5+, Mistral) are tuned for tool use and work well. Older or off-brand models will hallucinate the wire format. Always test the specific model you plan to ship; do not assume that 'tool calling works' generalizes.
The big idea: small local models become reliable when you constrain their output. The cloud's apparent reliability is partly that the engine already does this for you.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-local-function-calling-structured-output-creators
What is the core idea behind "Local Function Calling and Structured Output: Making Small Models Reliable"?
Which term best describes a foundational idea in "Local Function Calling and Structured Output: Making Small Models Reliable"?
A learner studying Local Function Calling and Structured Output: Making Small Models Reliable would need to understand which concept?
Which of these is directly relevant to Local Function Calling and Structured Output: Making Small Models Reliable?
Which of the following is a key point about Local Function Calling and Structured Output: Making Small Models Reliable?
What is one important takeaway from studying Local Function Calling and Structured Output: Making Small Models Reliable?
What is the key insight about "Pair tools with retries" in the context of Local Function Calling and Structured Output: Making Small Models Reliable?
What is the key insight about "Schema does not check semantics" in the context of Local Function Calling and Structured Output: Making Small Models Reliable?
What is the key insight about "From the community" in the context of Local Function Calling and Structured Output: Making Small Models Reliable?
Which statement accurately describes an aspect of Local Function Calling and Structured Output: Making Small Models Reliable?
What does working with Local Function Calling and Structured Output: Making Small Models Reliable typically involve?
Which of the following is true about Local Function Calling and Structured Output: Making Small Models Reliable?
Which best describes the scope of "Local Function Calling and Structured Output: Making Small Models Reliable"?
Which section heading best belongs in a lesson about Local Function Calling and Structured Output: Making Small Models Reliable?
Which section heading best belongs in a lesson about Local Function Calling and Structured Output: Making Small Models Reliable?