Structured Output With Zod

Force an LLM to return JSON that matches a schema. Zod + tool-use or JSON mode makes this reliable.

45 min · Reviewed 2026

Stop Parsing Prose

Asking an LLM for JSON and parsing the string is a trap. Use the AI SDK's generateObject with a Zod schema — you get validated, typed data every time.

import { generateObject } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const ExtractSchema = z.object({
  people: z.array(
    z.object({
      name: z.string(),
      role: z.string().optional(),
      email: z.string().email().optional(),
    })
  ),
  summary: z.string().max(240),
});

export async function extract(text: string) {
  const { object } = await generateObject({
    model: anthropic("claude-opus-4-7"),
    schema: ExtractSchema,
    prompt: `Extract people mentioned and a short summary from:\n\n${text}`,
  });

  // object is fully typed from the schema
  console.log(`${object.people.length} people found`);
  return object;
}generateObject validates the response against the schema and retries on failure. You get a typed object, not a string.

With raw OpenAI instead

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const r = await client.responses.parse({
  model: "gpt-5",
  input: [{ role: "user", content: "Extract a person from: Ada Lovelace, mathematician" }],
  text: { format: zodResponseFormat(z.object({ name: z.string(), role: z.string() }), "person") },
});

console.log(r.output_parsed); // { name: "Ada Lovelace", role: "mathematician" }OpenAI's helpers.zod converts Zod to the JSON schema the API wants. parse() returns typed data.

Understanding "Structured Output With Zod" in practice: AI-assisted coding shifts work from syntax recall to design thinking — models handle boilerplate so you focus on architecture. Force an LLM to return JSON that matches a schema. Zod + tool-use or JSON mode makes this reliable — and knowing how to apply this gives you a concrete advantage.

Apply Zod in your ai-coding workflow to get better results
Apply JSON schema in your ai-coding workflow to get better results
Apply structured output in your ai-coding workflow to get better results
Apply parse in your ai-coding workflow to get better results

Use AI to generate unit tests for an existing function
Ask AI to refactor a messy function and explain the changes
Have AI suggest a code review for a recent pull request

The big idea: never parse JSON by hand from an LLM. Zod + generateObject turns the model into a typed function.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-structured-output-zod-creators

What is the primary advantage of using generateObject with a Zod schema instead of manually parsing JSON from an LLM response?
1. It prevents the LLM from generating any text outside the JSON structure
2. You get validated, typed data automatically without writing parsing logic
3. It guarantees the LLM will produce output in exactly the same format every time
4. The LLM will automatically use fewer tokens, reducing costs
How does adding .describe() to a Zod schema field help when working with LLMs?
1. It automatically sanitizes the input to prevent prompt injection
2. The description becomes part of the prompt, telling the model what the field represents
3. It converts the field to a different data type based on the description
4. It adds runtime validation to ensure the field matches the description
Why should you add .max() to array fields in a Zod schema used with an LLM?
1. To prevent the model from returning too many items and inflating your API costs
2. To ensure all array items are unique
3. To convert the array into a different data structure
4. To force the model to return items in a specific order
What does the lesson identify as the fundamental problem with manually parsing JSON from LLM output?
1. LLMs charge extra when you parse their output
2. LLM output is text, not reliable structured data, so parsing it is error-prone
3. The LLM cannot produce valid JSON without special training
4. JSON parsing requires expensive third-party libraries
In the context of this lesson, what is Zod primarily used for?
1. Storing conversation history in a database
2. Defining schemas that validate and type-check LLM output
3. Making HTTP requests to the OpenAI API
4. Creating visual user interfaces for AI applications
The lesson describes a Zod schema as serving two purposes simultaneously. What are they?
1. Data storage format and API response template
2. Input validation and output formatting
3. Documentation for developers and prompt guidance for the model
4. Type definition and runtime error handling
What specific problem does generateObject solve that raw LLM API calls don't?
1. It automatically selects the best model for each request
2. It provides built-in content moderation filtering
3. It reduces the latency of LLM responses
4. It ensures the output conforms to a predictable structure
What does the lesson mean when it says Zod + generateObject 'turns the model into a typed function'?
1. It converts the LLM into a local function that runs offline
2. The LLM can only be called from TypeScript code
3. It makes the LLM execute code automatically
4. The LLM now accepts typed inputs and returns typed outputs with validation
If you don't limit array sizes in your Zod schema, what practical risk do you face?
1. The model may return hundreds of items, significantly increasing your API costs
2. The LLM will refuse to generate any output
3. The response will be truncated and lose important data
4. The array will fail validation and crash your application
Why is using .describe() on an email field specifically recommended in the lesson?
1. It validates that the email actually exists
2. It tells the model to generate a properly formatted email address string
3. It automatically sends a verification email
4. It encrypts the email field in the output
What is the key difference between manual JSON parsing and using generateObject with validation?
1. Validation catches incorrect data types; parsing only extracts strings
2. Validation runs faster than parsing
3. Validation works offline; parsing requires internet
4. Validation is optional; parsing is required
What happens when generateObject receives output that doesn't match your Zod schema?
1. It asks the LLM to regenerate the response
2. It throws an error rather than returning invalid data
3. It automatically fixes the data to match the schema
4. It returns the data anyway but marks it as unverified
What is the main benefit of having the schema serve as both documentation and prompt?
1. It reduces the total number of tokens needed per request
2. It makes the code run significantly faster
3. It allows you to skip writing unit tests
4. You maintain a single source of truth for both developer understanding and model behavior
Why does the lesson call parsing JSON from prose a 'trap'?
1. The parsing libraries are deprecated and no longer maintained
2. JSON cannot actually be extracted from LLM output
3. It seems to work initially but fails in production when LLMs produce unexpected formats
4. It violates the terms of service for OpenAI's API
Which AI SDK method does the lesson specifically recommend for structured output?
1. streamText
2. generateObject
3. generateSpeech
4. createCompletion

← Back to interactive lesson

Tendril · Creators · AI-Assisted Coding

Structured Output With Zod

Force an LLM to return JSON that matches a schema. Zod + tool-use or JSON mode makes this reliable.

45 min · Reviewed 2026

Stop Parsing Prose

Asking an LLM for JSON and parsing the string is a trap. Use the AI SDK's generateObject with a Zod schema — you get validated, typed data every time.

import { generateObject } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const ExtractSchema = z.object({
  people: z.array(
    z.object({
      name: z.string(),
      role: z.string().optional(),
      email: z.string().email().optional(),
    })
  ),
  summary: z.string().max(240),
});

export async function extract(text: string) {
  const { object } = await generateObject({
    model: anthropic("claude-opus-4-7"),
    schema: ExtractSchema,
    prompt: `Extract people mentioned and a short summary from:\n\n${text}`,
  });

  // object is fully typed from the schema
  console.log(`${object.people.length} people found`);
  return object;
}generateObject validates the response against the schema and retries on failure. You get a typed object, not a string.

With raw OpenAI instead

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const r = await client.responses.parse({
  model: "gpt-5",
  input: [{ role: "user", content: "Extract a person from: Ada Lovelace, mathematician" }],
  text: { format: zodResponseFormat(z.object({ name: z.string(), role: z.string() }), "person") },
});

console.log(r.output_parsed); // { name: "Ada Lovelace", role: "mathematician" }OpenAI's helpers.zod converts Zod to the JSON schema the API wants. parse() returns typed data.

Apply Zod in your ai-coding workflow to get better results
Apply JSON schema in your ai-coding workflow to get better results
Apply structured output in your ai-coding workflow to get better results
Apply parse in your ai-coding workflow to get better results

Use AI to generate unit tests for an existing function
Ask AI to refactor a messy function and explain the changes
Have AI suggest a code review for a recent pull request

The big idea: never parse JSON by hand from an LLM. Zod + generateObject turns the model into a typed function.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-structured-output-zod-creators

What is the primary advantage of using generateObject with a Zod schema instead of manually parsing JSON from an LLM response?
1. It prevents the LLM from generating any text outside the JSON structure
2. You get validated, typed data automatically without writing parsing logic
3. It guarantees the LLM will produce output in exactly the same format every time
4. The LLM will automatically use fewer tokens, reducing costs
How does adding .describe() to a Zod schema field help when working with LLMs?
1. It automatically sanitizes the input to prevent prompt injection
2. The description becomes part of the prompt, telling the model what the field represents
3. It converts the field to a different data type based on the description
4. It adds runtime validation to ensure the field matches the description
Why should you add .max() to array fields in a Zod schema used with an LLM?
1. To prevent the model from returning too many items and inflating your API costs
2. To ensure all array items are unique
3. To convert the array into a different data structure
4. To force the model to return items in a specific order
What does the lesson identify as the fundamental problem with manually parsing JSON from LLM output?
1. LLMs charge extra when you parse their output
2. LLM output is text, not reliable structured data, so parsing it is error-prone
3. The LLM cannot produce valid JSON without special training
4. JSON parsing requires expensive third-party libraries
In the context of this lesson, what is Zod primarily used for?
1. Storing conversation history in a database
2. Defining schemas that validate and type-check LLM output
3. Making HTTP requests to the OpenAI API
4. Creating visual user interfaces for AI applications
The lesson describes a Zod schema as serving two purposes simultaneously. What are they?
1. Data storage format and API response template
2. Input validation and output formatting
3. Documentation for developers and prompt guidance for the model
4. Type definition and runtime error handling
What specific problem does generateObject solve that raw LLM API calls don't?
1. It automatically selects the best model for each request
2. It provides built-in content moderation filtering
3. It reduces the latency of LLM responses
4. It ensures the output conforms to a predictable structure
What does the lesson mean when it says Zod + generateObject 'turns the model into a typed function'?
1. It converts the LLM into a local function that runs offline
2. The LLM can only be called from TypeScript code
3. It makes the LLM execute code automatically
4. The LLM now accepts typed inputs and returns typed outputs with validation
If you don't limit array sizes in your Zod schema, what practical risk do you face?
1. The model may return hundreds of items, significantly increasing your API costs
2. The LLM will refuse to generate any output
3. The response will be truncated and lose important data
4. The array will fail validation and crash your application
Why is using .describe() on an email field specifically recommended in the lesson?
1. It validates that the email actually exists
2. It tells the model to generate a properly formatted email address string
3. It automatically sends a verification email
4. It encrypts the email field in the output
What is the key difference between manual JSON parsing and using generateObject with validation?
1. Validation catches incorrect data types; parsing only extracts strings
2. Validation runs faster than parsing
3. Validation works offline; parsing requires internet
4. Validation is optional; parsing is required
What happens when generateObject receives output that doesn't match your Zod schema?
1. It asks the LLM to regenerate the response
2. It throws an error rather than returning invalid data
3. It automatically fixes the data to match the schema
4. It returns the data anyway but marks it as unverified
What is the main benefit of having the schema serve as both documentation and prompt?
1. It reduces the total number of tokens needed per request
2. It makes the code run significantly faster
3. It allows you to skip writing unit tests
4. You maintain a single source of truth for both developer understanding and model behavior
Why does the lesson call parsing JSON from prose a 'trap'?
1. The parsing libraries are deprecated and no longer maintained
2. JSON cannot actually be extracted from LLM output
3. It seems to work initially but fails in production when LLMs produce unexpected formats
4. It violates the terms of service for OpenAI's API
Which AI SDK method does the lesson specifically recommend for structured output?
1. streamText
2. generateObject
3. generateSpeech
4. createCompletion

← Back to interactive lesson