Calling the Claude API With Streaming

Anthropic's SDK in 20 lines. Learn messages, streaming tokens, and basic error handling.

45 min · Reviewed 2026

Messages In, Tokens Out

Claude's API takes a list of messages and returns a reply. Streaming yields tokens as they are generated so users see output immediately.

npm install @anthropic-ai/sdk
export ANTHROPIC_API_KEY=sk-ant-...SDK + env var. That is the setup.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

export async function ask(prompt: string): Promise<string> {
  const res = await client.messages.create({
    model: "claude-opus-4-7",
    max_tokens: 1024,
    system: "You are a concise coding tutor.",
    messages: [{ role: "user", content: prompt }],
  });

  const block = res.content[0];
  if (block.type !== "text") throw new Error("expected text block");
  return block.text;
}messages.create returns content blocks. Narrow on type before accessing text.

export async function askStreaming(prompt: string) {
  const stream = client.messages.stream({
    model: "claude-opus-4-7",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
  });

  for await (const event of stream) {
    if (
      event.type === "content_block_delta" &&
      event.delta.type === "text_delta"
    ) {
      process.stdout.write(event.delta.text);
    }
  }

  const final = await stream.finalMessage();
  console.log("\nstop_reason:", final.stop_reason);
}Stream events are typed. Filter for text_delta and write tokens as they arrive.

Understanding "Calling the Claude API With Streaming" in practice: AI-assisted coding shifts work from syntax recall to design thinking — models handle boilerplate so you focus on architecture. Anthropic's SDK in 20 lines. Learn messages, streaming tokens, and basic error handling — and knowing how to apply this gives you a concrete advantage.

Apply messages API in your ai-coding workflow to get better results
Apply streaming in your ai-coding workflow to get better results
Apply system prompt in your ai-coding workflow to get better results
Apply model id in your ai-coding workflow to get better results

Use AI to generate unit tests for an existing function
Ask AI to refactor a messy function and explain the changes
Have AI suggest a code review for a recent pull request

The big idea: messages.create for batch, messages.stream for UI. Narrow on block types and handle 529 like a grown-up.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-claude-api-streaming-creators

What two things does the Claude API process as part of its core functionality?
1. Images and semantic embeddings
2. Messages in and tokens out
3. Audio files and text transcriptions
4. HTTP headers and status codes
What is the primary advantage of using streaming when calling the Claude API?
1. It allows tokens to appear as they are generated rather than waiting for the complete response
2. It reduces the total cost of API calls
3. It enables caching of system prompts
4. It guarantees more accurate responses
What parameter and value should you add to a system prompt block to enable caching across multiple API calls?
1. cache_enabled: true
2. persistent: yes
3. storage: 'session'
4. cache_control: { type: 'ephemeral' }
What does HTTP error code 529 indicate when calling the Claude API, and how should it be handled?
1. The prompt exceeded the token limit — shorten your input
2. The service is overloaded — retry with exponential backoff
3. The API key is invalid — you must regenerate credentials
4. The request timed out — retry immediately
Why is retrying with exponential backoff preferred over using a tight loop after receiving a 529 error?
1. Exponential backoff is required by the API's terms of service
2. A tight loop uses too much memory on the client side
3. The API automatically blocks clients that use tight loops
4. A tight loop can overwhelm the server again, worsening the overload condition
Which Claude API method should you use when building a real-time chat interface where users want to see responses appear word-by-word?
1. messages.create
2. messages.complete
3. messages.stream
4. messages.generate
What does the lesson mean by 'narrow on block types' when processing Claude API responses?
1. Compress the response to reduce bandwidth
2. Use the smallest possible API model to reduce costs
3. Filter and handle only the specific content block types your application needs
4. Limit the number of messages in your conversation history
Approximately what percentage of cost savings can prompt caching provide on the cached portion of a system prompt?
1. 70%
2. 50%
3. 90%
4. 25%
When should you use messages.create instead of messages.stream for Claude API calls?
1. When handling 529 overloaded errors
2. When you need to use prompt caching
3. When you want to process the entire response at once without showing incremental output
4. When building a real-time user interface
What is a 'content block' in the context of Claude API responses?
1. A security boundary for API access
2. The header information included with each API request
3. A unit of cached prompt data
4. A discrete piece of content in the response, such as text, tool use, or other structured data
In the context of the Claude API, what does 'streaming tokens' mean technically?
1. Converting text to audio in real-time
2. Transmitting response text piece-by-piece as each token is generated by the model
3. Uploading large documents in chunks
4. Sending multiple messages simultaneously to the API
What is the primary purpose of the 'ephemeral' cache type in prompt caching?
1. To cache prompts permanently across sessions
2. To temporarily cache prompts for the duration of a conversation or set of related calls
3. To encrypt cached prompts for security
4. To compress cached prompts to save storage space
What happens when you use a 'tight loop' to retry after receiving overloaded (529) errors from the Claude API?
1. You risk worsening the overload and may be temporarily blocked
2. Your requests will eventually succeed but consume more API credits
3. The API automatically extends your rate limits
4. The API will switch to a backup server automatically
Which scenario best demonstrates the 'Messages In, Tokens Out' concept from the API?
1. Sending a list of conversation messages and receiving a text reply
2. Converting text to speech and receiving audio
3. Translating text from one language to another
4. Uploading an image and receiving a text description
What type of application would benefit most from using messages.stream over messages.create?
1. A batch processing script that generates reports overnight
2. A data export tool
3. A chatbot interface where users watch responses being typed out
4. An API that generates invoices in the background

← Back to interactive lesson

Tendril · Creators · AI-Assisted Coding

Calling the Claude API With Streaming

Anthropic's SDK in 20 lines. Learn messages, streaming tokens, and basic error handling.

45 min · Reviewed 2026

Messages In, Tokens Out

Claude's API takes a list of messages and returns a reply. Streaming yields tokens as they are generated so users see output immediately.

npm install @anthropic-ai/sdk
export ANTHROPIC_API_KEY=sk-ant-...SDK + env var. That is the setup.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

export async function ask(prompt: string): Promise<string> {
  const res = await client.messages.create({
    model: "claude-opus-4-7",
    max_tokens: 1024,
    system: "You are a concise coding tutor.",
    messages: [{ role: "user", content: prompt }],
  });

  const block = res.content[0];
  if (block.type !== "text") throw new Error("expected text block");
  return block.text;
}messages.create returns content blocks. Narrow on type before accessing text.

export async function askStreaming(prompt: string) {
  const stream = client.messages.stream({
    model: "claude-opus-4-7",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
  });

  for await (const event of stream) {
    if (
      event.type === "content_block_delta" &&
      event.delta.type === "text_delta"
    ) {
      process.stdout.write(event.delta.text);
    }
  }

  const final = await stream.finalMessage();
  console.log("\nstop_reason:", final.stop_reason);
}Stream events are typed. Filter for text_delta and write tokens as they arrive.

Apply messages API in your ai-coding workflow to get better results
Apply streaming in your ai-coding workflow to get better results
Apply system prompt in your ai-coding workflow to get better results
Apply model id in your ai-coding workflow to get better results

Use AI to generate unit tests for an existing function
Ask AI to refactor a messy function and explain the changes
Have AI suggest a code review for a recent pull request

The big idea: messages.create for batch, messages.stream for UI. Narrow on block types and handle 529 like a grown-up.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-claude-api-streaming-creators

What two things does the Claude API process as part of its core functionality?
1. Images and semantic embeddings
2. Messages in and tokens out
3. Audio files and text transcriptions
4. HTTP headers and status codes
What is the primary advantage of using streaming when calling the Claude API?
1. It allows tokens to appear as they are generated rather than waiting for the complete response
2. It reduces the total cost of API calls
3. It enables caching of system prompts
4. It guarantees more accurate responses
What parameter and value should you add to a system prompt block to enable caching across multiple API calls?
1. cache_enabled: true
2. persistent: yes
3. storage: 'session'
4. cache_control: { type: 'ephemeral' }
What does HTTP error code 529 indicate when calling the Claude API, and how should it be handled?
1. The prompt exceeded the token limit — shorten your input
2. The service is overloaded — retry with exponential backoff
3. The API key is invalid — you must regenerate credentials
4. The request timed out — retry immediately
Why is retrying with exponential backoff preferred over using a tight loop after receiving a 529 error?
1. Exponential backoff is required by the API's terms of service
2. A tight loop uses too much memory on the client side
3. The API automatically blocks clients that use tight loops
4. A tight loop can overwhelm the server again, worsening the overload condition
Which Claude API method should you use when building a real-time chat interface where users want to see responses appear word-by-word?
1. messages.create
2. messages.complete
3. messages.stream
4. messages.generate
What does the lesson mean by 'narrow on block types' when processing Claude API responses?
1. Compress the response to reduce bandwidth
2. Use the smallest possible API model to reduce costs
3. Filter and handle only the specific content block types your application needs
4. Limit the number of messages in your conversation history
Approximately what percentage of cost savings can prompt caching provide on the cached portion of a system prompt?
1. 70%
2. 50%
3. 90%
4. 25%
When should you use messages.create instead of messages.stream for Claude API calls?
1. When handling 529 overloaded errors
2. When you need to use prompt caching
3. When you want to process the entire response at once without showing incremental output
4. When building a real-time user interface
What is a 'content block' in the context of Claude API responses?
1. A security boundary for API access
2. The header information included with each API request
3. A unit of cached prompt data
4. A discrete piece of content in the response, such as text, tool use, or other structured data
In the context of the Claude API, what does 'streaming tokens' mean technically?
1. Converting text to audio in real-time
2. Transmitting response text piece-by-piece as each token is generated by the model
3. Uploading large documents in chunks
4. Sending multiple messages simultaneously to the API
What is the primary purpose of the 'ephemeral' cache type in prompt caching?
1. To cache prompts permanently across sessions
2. To temporarily cache prompts for the duration of a conversation or set of related calls
3. To encrypt cached prompts for security
4. To compress cached prompts to save storage space
What happens when you use a 'tight loop' to retry after receiving overloaded (529) errors from the Claude API?
1. You risk worsening the overload and may be temporarily blocked
2. Your requests will eventually succeed but consume more API credits
3. The API automatically extends your rate limits
4. The API will switch to a backup server automatically
Which scenario best demonstrates the 'Messages In, Tokens Out' concept from the API?
1. Sending a list of conversation messages and receiving a text reply
2. Converting text to speech and receiving audio
3. Translating text from one language to another
4. Uploading an image and receiving a text description
What type of application would benefit most from using messages.stream over messages.create?
1. A batch processing script that generates reports overnight
2. A data export tool
3. A chatbot interface where users watch responses being typed out
4. An API that generates invoices in the background

← Back to interactive lesson