Tackling JSON Perplexity in LLM Outputs: A Weekend Project
Josiah Bryan
Posted on April 15, 2024
This weekend, I dove deep into a problem we often encounter in natural language processing (NLP): ensuring the accuracy and reliability of JSON outputs from large language models (LLMs), particularly when dealing with key/value pairs.
The Challenge
We frequently face the issue of not having direct methods to measure perplexity or log probabilities on function calls from LLMs. This makes it tough to trust the reliability of the JSON generated by these models, especially when it's critical to ensure that each key and value in our outputs not only makes sense but is also based on predictable patterns.
My Solution
To address this, I developed a robust JSON parser. The goal was to extract JSON directly from the stream of log probabilities provided by OpenAI when generating text outputs that contain JSON elements. This parser isn't just about pulling JSON out of the text—it's smart enough to calculate the perplexity and probabilities for each key/value, ensuring that what we get is as accurate as it can be. While JSON parsing can get a bit complex, and my solution isn't flawless, it has passed all my tests and is proving quite robust for my needs.
For example: for a given JSON object generated by an LLM, such as:
{ formalName: 'Josiah Bryan', nickname: 'Joey', ageGuess: 28 }
For that same object, my parser can generate a metadata object with the following data:
{
formalName: {
key: 'formalName',
value: 'Josiah Bryan',
keyProb: 0.999996,
valueProb: 0.999957,
keyPerplexity: 1.000001,
valuePerplexity: 1.000014,
finished: true
},
nickname: {
key: 'nickname',
value: 'Joey',
keyProb: 0.999996,
valueProb: 0.872926,
keyPerplexity: 1.000004,
valuePerplexity: 1.070314,
finished: true
},
ageGuess: {
key: 'ageGuess',
value: 28,
keyProb: 0.999994,
valueProb: 0.594872,
keyPerplexity: 1.000003,
valuePerplexity: 1.681035,
finished: true
}
}
(The finished
prop in this example is useful when parsing a stream of chunks. When parsing JSON from a firehose like that, the finished
prop is false
while the parser is still consuming more tokens for the value. Once the parser hits an end token (e.g. ,
or "
, etc), it flips finished
to true
so you know the value is final.)
Why It's Cool
This is made practically useful part with a custom yup
decorator to actively manage the model's output. If the parser detects that the perplexity of a generated content goes above our comfort threshold, it can automatically tweak the prompt or inject additional grounding into the model’s inputs. This ensures that the generated JSON is not only precise but also deeply rooted in factual accuracy.
For example, here's how the schema is specified with custom max perplexity
values per field:
const schema = yup.object().shape({
formalName: yup
.string()
.required()
.description('Formal name')
.perplexity({ max: 1.125 }),
nickname: yup
.string()
.required()
.description('Generated nickname')
.perplexity({ max: 1.5 }),
ageGuess: yup
.number()
.required()
.description('Generated age guess')
.perplexity({ max: 99 }),
});
Then, when passing that to the coaxLLm
method, we can also include a callback to add more grounding when perplexity is too high on a given field:
const { content, object, objectWithMetadata, failure } = await coaxLlm({
prompt,
schema,
logger,
langfuseTrace,
cacheMode: 'save',
failureInjectCallback: async ({ type, path }) => {
if (
type === 'perplexity' &&
['nickname', 'formalName'].includes(path)
) {
return [`My name is: "${authorization.user.name}"`];
}
return [];
},
});
Just in time for a busy upcoming week, this tool has become an indispensable asset in my toolkit, enhancing the grounding of LLM outputs and significantly speeding up JSON generation—a win-win for any developer.
Check Out the Code
Interested in seeing this in action or integrating it into your own projects? Here’s the link to the full code on how to coax and re-ground the LLM effectively: coax-llm.js.
Bonus: Real-Time Streaming
This parser also works seamlessly with streaming outputs from LLMs. This means we can fetch JSON objects and log probabilities in real-time, without waiting for the entire text generation to complete. It’s efficient and allows for immediate adjustments or error handling, boosting both performance and reliability.
Dive Deeper
For those who love digging into the nuts and bolts, here’s a direct link to the parser itself: logprobsToAnnotatedJson.js.
While I haven’t made the underlying detailed benchwork public, the gists provided are self-contained and full of actionable insights. They're not just theoretical but are primed for real-world application, and I'm using them personally in production (pushing them to my k8s cluster tonight, even as I type.)
Looking forward to your thoughts and any feedback you might have!
Posted on April 15, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.