Lesson 417 of 2116
Hermes For Structured JSON Output: Schemas That Work
When you need data, not prose, an open-weight model has to play by a schema. Hermes is one of the more reliable choices — but only if you prompt it carefully.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Why JSON-from-LLMs is harder than it looks
- 2structured output
- 3JSON schema
- 4grammar-constrained decoding
Concept cluster
Terms to connect while reading
Section 1
Why JSON-from-LLMs is harder than it looks
Asking a model for JSON is easy. Asking it for a JSON object that always matches your schema is hard. Frontier API models offer schema-strict modes; open-weight models often need help. Hermes is responsive to good instructions, and when paired with grammar-constrained decoding (available in llama.cpp / Ollama), it can be very reliable.
Schemas that work in practice
- 1Keep the schema flat where possible — fewer levels of nesting means fewer chances for the model to drop a brace.
- 2Use enum lists for categorical fields — 'category' should be one of a fixed list, not free text.
- 3Always include an 'id' field that echoes input — easier to map outputs back.
- 4Add a 'confidence' field when you can — useful for routing low-confidence cases to a human.
- 5Provide ONE example of the exact output you want, formatted as the schema, in the prompt.
An example output beats three sentences of explanation about the schema.
Prompt skeleton:
SYSTEM: You will receive an input. Return ONLY a JSON object
matching this schema. Do not add commentary, do not wrap in code fences:
{
"id": string, // echo input id
"category": one of ["a","b","c"],
"summary": string (max 30 words),
"confidence": number 0.0-1.0
}
Example output for an input id="x1":
{"id":"x1","category":"b","summary":"...","confidence":0.78}
Now process the input below.Grammar-constrained decoding
llama.cpp supports a grammar feature that physically prevents the model from emitting tokens that violate a JSON schema. When available, it is the strongest reliability tool in your kit — schema violations become impossible, not unlikely. Both Ollama and LM Studio expose access to this feature.
Compare the options
| Approach | Reliability | Setup effort | Trade-off |
|---|---|---|---|
| Plain prompt with example | Good | Low | Occasional drift on edge cases |
| Prompt + retry on parse failure | Better | Low | Slower on bad runs |
| Grammar-constrained decoding | Best | Medium | Schema must be expressible as a grammar |
| Full schema-validating loop | Excellent | Higher | Most code to maintain |
Applied exercise
- 1Pick a real classification or extraction task you do.
- 2Define a flat JSON schema for the output.
- 3Prompt Hermes with the skeleton above and run on 25 inputs.
- 4Compute schema-failure rate. If above 5%, try grammar-constrained decoding and recompute.
Key terms in this lesson
The big idea: structured output from open-weight models is solvable. Use grammar constraints when you can, validate always, and never trust the model to remember the schema mid-stream.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Hermes For Structured JSON Output: Schemas That Work”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 10 min
Local Function Calling and Structured Output: Making Small Models Reliable
Tool use and JSON output are not just frontier-cloud features. Modern Ollama and llama.cpp support both — with sharper constraints that pay off in reliability.
Creators · 30 min
Structured Output Modes: JSON Mode, Schema, Tool Forcing
How vendors implement structured output and which mode to pick per use case.
Creators · 40 min
Output Format Engineering: Schemas, Length Control, and Reliability, Part 1
If you're parsing model output in code, format reliability matters as much as content quality. Here's how to architect prompts and validators that produce parseable output even from imperfect models.
