I’ve implemented numerous features using ChatGPT, in particular making extensive use of the OpenAI Assistant API, such as for automating the creation of titles based on task descriptions in a home maintenance app I’m working on.
Recently, while updating the config for an Assistant, I came across a new and seemingly innocent-looking option: structured outputs.
The feature isn’t immediately obvious within the OpenAI Assistant config, and to be honest, I actually came upon it by chance. It doesn’t seem like much of a change: basically request the output be returned in JSON format rather than plain text.
But after starting to play with it, wow, was I blown away by the potential. In this blog post, I’ll talk about why this updated output format is a big deal, show an example use case, and some potentially less obvious factors to be aware of when making use of this feature.
From providing information to enabling features
What’s the impact of changing your assistant to use structured output? For comparison the more conventional chat output, plain text, or in some cases, formatted text, is intended to be read and used by a person.
Structured output, on the other hand, is meant to be used by computer programs. When I am able to define the schema, or structure, of the response from an AI assistant, I have effectively turned the assistant output into a kind of API endpoint.
I can use it like any other data source and effectively turn Assistant output into the basis for an app feature. Let’s look at a quick example.
A quick example
If you are new to creating and using Assistants, they are configured at two key levels:
The general assistant instructions (or “system instructions”), which provides overarching instructions for all prompts.
The prompt, which is the specific instructions for one particular output.
Here’s an example of this, in which we use an Assistant to create a trivia game.
First, we add these general instructions:
The key portion here is the second part, where we describe a specific schema for the ouput.
Then, when we want to create a new game, we just submit this simple prompt:
When we run this, we’ll get output that will always be structured as in the example below. The set of questions we get back will change but the structure should remain identical.
Now, that may not seem like a big deal, but in fact, from the above output, we can create a fully working trivia game. 🚀
To dig into the details of how I used the above and turned it into a fully working app, take a look at the source code.
This is of course just a toy app example, but I am guessing you see the potential here. Structured output turns Assistants into a kind of infinite API that is only limited by your imagination.
At the same time, there are some gotchas to be aware of…
How structured output impacts your approach to creating and using assistants
While digging into the structured output feature, a few key considerations stood out:
How you configure your assistant is turned upside down compared to non-structured output.
Somewhat unintuitively, NOT selecting the Structured JSON output option is better.
You’ll need to manage and likely persist assistant ouput.
How you configure your assistant is turned upside down
Normally, when configuring an Assistant, I find it to be best practice to provide broad assistant instructions and then provide specifics in each prompt. In the case of non-structured output, this makes sense because the specifics of what is requested will only be known at the point when the prompt is submitted.
For example, for a home maintenance assistant, the overall instructions might be to always provide brief responses, ask a maximum of two questions, and always include DIY tips. Then, in the prompt, we state the specific home maintenance task we’d need help with.
Fully describe the use case in the general instructions and only use the
prompt to define output variables.
However, what I’ve realized when working with structured output is that you
should turn this concept on its head. By configuring an assistant to provide
structured output, you are effectively narrowing its purpose to a single use
case, as defined by the output structure. Therefore, you should fully describe
the use case in the general instructions and only use the prompt to request an
instance of that output.
One can think of the general instructions as your function, and the prompt as a function call.
Building on this metaphor, we can add “arguments” in the general instructions and then include the argument values in our prompts. For example, let’s say we updated the general instructions in the Trivia Game example to be as follows:
Note that we excluded the question quantity in the general instructions. With this change, our prompt “function call” might look like this:
Obviously, we could add many more “arguments”, eg categories to include, level of difficulty, etc. but the main point is that all the details go in the general instructions and only the differences for the specific instance are included in the prompt.
Somewhat unintuitively, NOT selecting the Structured JSON output option is better
Another aspect of working with structured output that led to some initial confusion was whether to use the json_object vs json_schema response option.
Initially, I figured that since I want output to be restricted to a specific shape that the json_schema option would be the way to go. While that technically did work, it was more work to create. Even when using the built-in schema generator, I ran into errors when I needed to make any updates.
Using json_object allows for keeping all instructions in one place.
Instead, just describing the schema in the general instructions and then using
the json_object option was both much simpler to create and maintain, and it
also meant my overall instructions and the schema definition were all in one
place.
So far, I have not found a good use case for using json_schema over json_object.
You’ll need to manage and likely persist assistant ouput
When I initially was playing around with the trivia game, I’d generate the game output, display the game ui and was able to play a game no problem. But at some point I made the “mistake” of refreshing my page mid-game, which resulted first in a delay while a new game was being created, followed by my trivia game being replaced with a brand new game. Obviously, once we’ve started playing a game, we don’t want it to suddenly be replaced by a different game. 😞
While assistants will generate the output you’ve described, it’s then up to you
to ensure the output is properly managed.
Expect to add caching and/or database persistence and possibly store other
values, like thread ids.
In the case of my trivia game, I needed to cache my game output, so that, for
example, when a user refreshes the page, the same trivia game is displayed and
with no delay. To achieve this, I generated an id with each ouput and used that
to place it in a memory store. I also decided to incude the Assistant
thread
id, to help ensure each game is different for a particular user.
The specifics of how you manage the output will vary, but you will almost certainly need to persist it in some fashion, be it in a cache and/or database.
Overall, I’m very excited about the potential of structured ouput. Hopefully, you’ll also discover some interesting use cases. To get an idea of how you might make use of the structured output feature, take a look at the sample demo).