Calling the OpenAI API

The Responses API is OpenAI's modern surface. One call, text and tools. Learn the shape you'll use most.

40 min · Reviewed 2026

Two APIs, One Client

OpenAI ships chat.completions (classic) and responses (modern). New code should prefer responses — it unifies text, tools, and structured output.

from openai import OpenAI

client = OpenAI()

def ask(prompt: str) -> str:
    try:
        r = client.responses.create(
            model="gpt-5",
            input=[
                {"role": "system", "content": "Be concise."},
                {"role": "user",   "content": prompt},
            ],
        )
        return r.output_text
    except Exception as e:
        print(f"OpenAI call failed: {e}")
        raise

print(ask("Explain recursion in one sentence."))output_text is a convenience accessor that concatenates all text in the response.

Streaming

def ask_stream(prompt: str) -> None:
    with client.responses.stream(
        model="gpt-5",
        input=[{"role": "user", "content": prompt}],
    ) as stream:
        for event in stream:
            if event.type == "response.output_text.delta":
                print(event.delta, end="", flush=True)
        stream.until_done()
    print()Context manager ensures the stream closes. Event types are strings — filter for the text delta.

Understanding "Calling the OpenAI API" in practice: AI-assisted coding shifts work from syntax recall to design thinking — models handle boilerplate so you focus on architecture. The Responses API is OpenAI's modern surface. One call, text and tools. Learn the shape you'll use most — and knowing how to apply this gives you a concrete advantage.

Apply Responses API in your ai-coding workflow to get better results
Apply chat completions in your ai-coding workflow to get better results
Apply streaming in your ai-coding workflow to get better results
Apply model in your ai-coding workflow to get better results

Use AI to generate unit tests for an existing function
Ask AI to refactor a messy function and explain the changes
Have AI suggest a code review for a recent pull request

The big idea: responses.create for the modern path, stream for UIs, and centralize model ids so provider swaps are painless.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-openai-api-creators

A team has hard-coded the model ID 'gpt-4o' in 15 different source files. What problem does this create when OpenAI releases a better model?
1. The application will automatically crash when the old model is deprecated
2. They must edit 15 separate files to update the model ID, requiring a large pull request
3. The API will charge double for requests using deprecated model names
4. Hard-coded model IDs prevent the application from making any API calls
Your application receives a 429 (rate limit) response from the OpenAI API. What strategy should you implement to handle this correctly?
1. Exponential backoff, which increases wait time progressively after each failed attempt
2. Switch to a different API provider to avoid rate limits entirely
3. Immediate retry once, then fail permanently if it fails again
4. A fixed 5-second delay before retrying the request indefinitely
What functionality does the `.with_raw_response` method provide in the OpenAI Python SDK?
1. It automatically retries failed requests without any configuration
2. It bypasses authentication to speed up API calls
3. It returns the raw HTTP response object instead of the parsed SDK response, enabling custom retry logic with libraries like tenacity
4. It converts the response into a streaming format automatically
In the context of the OpenAI API, what is streaming primarily used for?
1. Streaming audio files directly to the API for transcription
2. Archiving old API responses to cloud storage
3. Uploading large documents to the API in chunks
4. Displaying AI-generated text incrementally in user interfaces for a more responsive experience
What function is used to create a request in the modern Responses API?
1. chat.completions.create
2. gpt.generate
3. responses.create
4. api.request
A developer stores the model ID in a single config constant instead of hard-coding it throughout the application. What is the primary benefit of this approach?
1. Config constants bypass API rate limits
2. Model swaps become a one-line change in a single location
3. The API automatically selects the best model for each request
4. The application runs significantly faster due to less string processing
What types of output does the Responses API unify in a single call?
1. Database queries and file system operations
2. Text, tools, and structured output
3. Audio and video processing
4. Only text responses
What is exponential backoff in the context of API error handling?
1. Increasing the wait time progressively after each failed request to avoid overwhelming the server
2. Backing up data exponentially to prevent loss during outages
3. Reducing the size of the request payload to make it process faster
4. Stopping all requests immediately when any error occurs
When should developers prefer the Responses API over chat.completions?
1. Only when the application requires admin privileges
2. Only when the application has more than 1 million users
3. Only when working with image inputs
4. For all new code projects, because it is the modern unified interface
What does the `output_text` property contain in an OpenAI API response?
1. The time taken to generate the response in milliseconds
2. The text content generated by the model in response to the prompt
3. The raw HTTP headers from the API response
4. The model name that was used for the request
Why is streaming particularly beneficial for chat applications with user interfaces?
1. It automatically translates the output into multiple languages
2. It allows tokens to appear one by one as they are generated, making the AI feel more responsive
3. It reduces the total cost of each API call by 50%
4. It encrypts the response for better security
What happens if an application repeatedly ignores 429 responses without implementing backoff?
1. The requests will eventually go through but with extremely slow performance
2. The application will receive a refund for the failed requests
3. The API will automatically upgrade the account to a higher tier
4. The API may temporarily or permanently block the application from making further requests
What is the advantage of using `.with_raw_response` combined with a retry library like tenacity?
1. It allows fine-grained control over retry behavior while preserving access to response metadata
2. It creates a visual flowchart of all API calls made
3. It automatically generates unit tests for the API code
4. It encrypts all retry attempts for security
What does a 429 HTTP status code indicate when received from an API?
1. 404 Not Found — the requested model does not exist
2. Too Many Requests — the client has sent too many requests in a given time period
3. 200 OK — the request was successful
4. 500 Internal Server Error — the API has crashed
For a real-time chat application where users expect instant feedback, which API feature is most important to implement?
1. Streaming, to show tokens as they are generated
2. Batching, to combine multiple user messages into one request
3. Pagination, to break responses into multiple pages
4. Caching, to store all responses permanently

← Back to interactive lesson

Tendril · Creators · AI-Assisted Coding

Calling the OpenAI API

The Responses API is OpenAI's modern surface. One call, text and tools. Learn the shape you'll use most.

40 min · Reviewed 2026

Two APIs, One Client

OpenAI ships chat.completions (classic) and responses (modern). New code should prefer responses — it unifies text, tools, and structured output.

from openai import OpenAI

client = OpenAI()

def ask(prompt: str) -> str:
    try:
        r = client.responses.create(
            model="gpt-5",
            input=[
                {"role": "system", "content": "Be concise."},
                {"role": "user",   "content": prompt},
            ],
        )
        return r.output_text
    except Exception as e:
        print(f"OpenAI call failed: {e}")
        raise

print(ask("Explain recursion in one sentence."))output_text is a convenience accessor that concatenates all text in the response.

Streaming

def ask_stream(prompt: str) -> None:
    with client.responses.stream(
        model="gpt-5",
        input=[{"role": "user", "content": prompt}],
    ) as stream:
        for event in stream:
            if event.type == "response.output_text.delta":
                print(event.delta, end="", flush=True)
        stream.until_done()
    print()Context manager ensures the stream closes. Event types are strings — filter for the text delta.

Apply Responses API in your ai-coding workflow to get better results
Apply chat completions in your ai-coding workflow to get better results
Apply streaming in your ai-coding workflow to get better results
Apply model in your ai-coding workflow to get better results

Use AI to generate unit tests for an existing function
Ask AI to refactor a messy function and explain the changes
Have AI suggest a code review for a recent pull request

The big idea: responses.create for the modern path, stream for UIs, and centralize model ids so provider swaps are painless.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-openai-api-creators

A team has hard-coded the model ID 'gpt-4o' in 15 different source files. What problem does this create when OpenAI releases a better model?
1. The application will automatically crash when the old model is deprecated
2. They must edit 15 separate files to update the model ID, requiring a large pull request
3. The API will charge double for requests using deprecated model names
4. Hard-coded model IDs prevent the application from making any API calls
Your application receives a 429 (rate limit) response from the OpenAI API. What strategy should you implement to handle this correctly?
1. Exponential backoff, which increases wait time progressively after each failed attempt
2. Switch to a different API provider to avoid rate limits entirely
3. Immediate retry once, then fail permanently if it fails again
4. A fixed 5-second delay before retrying the request indefinitely
What functionality does the `.with_raw_response` method provide in the OpenAI Python SDK?
1. It automatically retries failed requests without any configuration
2. It bypasses authentication to speed up API calls
3. It returns the raw HTTP response object instead of the parsed SDK response, enabling custom retry logic with libraries like tenacity
4. It converts the response into a streaming format automatically
In the context of the OpenAI API, what is streaming primarily used for?
1. Streaming audio files directly to the API for transcription
2. Archiving old API responses to cloud storage
3. Uploading large documents to the API in chunks
4. Displaying AI-generated text incrementally in user interfaces for a more responsive experience
What function is used to create a request in the modern Responses API?
1. chat.completions.create
2. gpt.generate
3. responses.create
4. api.request
A developer stores the model ID in a single config constant instead of hard-coding it throughout the application. What is the primary benefit of this approach?
1. Config constants bypass API rate limits
2. Model swaps become a one-line change in a single location
3. The API automatically selects the best model for each request
4. The application runs significantly faster due to less string processing
What types of output does the Responses API unify in a single call?
1. Database queries and file system operations
2. Text, tools, and structured output
3. Audio and video processing
4. Only text responses
What is exponential backoff in the context of API error handling?
1. Increasing the wait time progressively after each failed request to avoid overwhelming the server
2. Backing up data exponentially to prevent loss during outages
3. Reducing the size of the request payload to make it process faster
4. Stopping all requests immediately when any error occurs
When should developers prefer the Responses API over chat.completions?
1. Only when the application requires admin privileges
2. Only when the application has more than 1 million users
3. Only when working with image inputs
4. For all new code projects, because it is the modern unified interface
What does the `output_text` property contain in an OpenAI API response?
1. The time taken to generate the response in milliseconds
2. The text content generated by the model in response to the prompt
3. The raw HTTP headers from the API response
4. The model name that was used for the request
Why is streaming particularly beneficial for chat applications with user interfaces?
1. It automatically translates the output into multiple languages
2. It allows tokens to appear one by one as they are generated, making the AI feel more responsive
3. It reduces the total cost of each API call by 50%
4. It encrypts the response for better security
What happens if an application repeatedly ignores 429 responses without implementing backoff?
1. The requests will eventually go through but with extremely slow performance
2. The application will receive a refund for the failed requests
3. The API will automatically upgrade the account to a higher tier
4. The API may temporarily or permanently block the application from making further requests
What is the advantage of using `.with_raw_response` combined with a retry library like tenacity?
1. It allows fine-grained control over retry behavior while preserving access to response metadata
2. It creates a visual flowchart of all API calls made
3. It automatically generates unit tests for the API code
4. It encrypts all retry attempts for security
What does a 429 HTTP status code indicate when received from an API?
1. 404 Not Found — the requested model does not exist
2. Too Many Requests — the client has sent too many requests in a given time period
3. 200 OK — the request was successful
4. 500 Internal Server Error — the API has crashed
For a real-time chat application where users expect instant feedback, which API feature is most important to implement?
1. Streaming, to show tokens as they are generated
2. Batching, to combine multiple user messages into one request
3. Pagination, to break responses into multiple pages
4. Caching, to store all responses permanently

← Back to interactive lesson