Lesson 329 of 1596
Hermes For Structured JSON Output: Schemas That Work
When you need data, not prose, an open-weight model has to play by a schema. Hermes is one of the more reliable choices — but only if you prompt it carefully.
Creators · Model Families · ~5 min read
Why JSON-from-LLMs is harder than it looks
Asking a model for JSON is easy. Asking it for a JSON object that always matches your schema is hard. Frontier API models offer schema-strict modes; open-weight models often need help. Hermes is responsive to good instructions, and when paired with grammar-constrained decoding (available in llama.cpp / Ollama), it can be very reliable.
Schemas that work in practice
- 1Keep the schema flat where possible — fewer levels of nesting means fewer chances for the model to drop a brace.
- 2Use enum lists for categorical fields — 'category' should be one of a fixed list, not free text.
- 3Always include an 'id' field that echoes input — easier to map outputs back.
- 4Add a 'confidence' field when you can — useful for routing low-confidence cases to a human.
- 5Provide ONE example of the exact output you want, formatted as the schema, in the prompt.
An example output beats three sentences of explanation about the schema.
Prompt skeleton: SYSTEM: You will receive an input. Return ONLY a JSON object matching this schema. Do not add commentary, do not wrap in code fences: { "id": string, // echo input id "category": one of ["a","b","c"], "summary": string (max 30 words), "confidence": number 0.0-1.0 } Example output for an input id="x1": {"id":"x1","category":"b","summary":"","confidence":0.78} Now process the input below.Grammar-constrained decoding
llama.cpp supports a grammar feature that physically prevents the model from emitting tokens that violate a JSON schema. When available, it is the strongest reliability tool in your kit — schema violations become impossible, not unlikely. Both Ollama and LM Studio expose access to this feature.
Compare the options
| Approach | Reliability | Setup effort | Trade-off |
|---|---|---|---|
| Plain prompt with example | Good | Low | Occasional drift on edge cases |
| Prompt + retry on parse failure | Better | Low | Slower on bad runs |
| Grammar-constrained decoding | Best | Medium | Schema must be expressible as a grammar |
| Full schema-validating loop | Excellent | Higher | Most code to maintain |
Applied exercise
- 1Pick a real classification or extraction task you do.
- 2Define a flat JSON schema for the output.
- 3Prompt Hermes with the skeleton above and run on 25 inputs.
- 4Compute schema-failure rate. If above 5%, try grammar-constrained decoding and recompute.
Key terms in this lesson
The big idea: structured output from open-weight models is solvable. Use grammar constraints when you can, validate always, and never trust the model to remember the schema mid-stream.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Hermes For Structured JSON Output: Schemas That Work”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Adults & Professionals · 10 min
Local Function Calling and Structured Output: Making Small Models Reliable
Tool use and JSON output are not just frontier-cloud features. Modern Ollama and llama.cpp support both — with sharper constraints that pay off in reliability.
Creators · 30 min
Structured Output Modes: JSON Mode, Schema, Tool Forcing
How vendors implement structured output and which mode to pick per use case.
Creators · 11 min
AI structured output modes across model families
Compare strict JSON modes across Claude, GPT, and Gemini.
