bezbos.
Posted on March 24, 2023
Introduction
The Completions API is the most fundamental OpenAI model that provides a simple interface that’s extremely flexible and powerful. You give it a prompt and it returns a text completion, generated according to your instructions. You can think of it as very advanced autocomplete where the language model processes your text prompt and tries to predict what’s most likely to come next.
Although simple to use, Completions API is also very customizable and exposes various parameters that can be set to affect how completions are generated (for better or worse).
This guide explains all the parameters with practical examples. After reading this guide you will have a deeper understanding of the Completion API and you will be able to practically apply this knowledge in your day-to-day work with OpenAI APIs.
For the examples in this article, I will use Postman for sending HTTP requests. I suggest you to do the same, but you can follow along with just about any HTTP client. You can also generate and customize text completions in the OpenAI playground or code completions in OpenAI JavaScript Sandbox.
Although I will be using Postman for sending requests, I have written a guide on how to integrate OpenAI APIs in JavaScript projects.
Also, a video of said guide is available on YouTube:
Completion Fundamentals
The best way to learn about completions is to generate them. We’re going to see what kind of request we have to send, where to send it and what’s the response. I will start with the simplest possible request and we’ll build up from there.
Basic Completion Request
To send a request to the Completion endpoint, we need to create a collection in Postman that will contain the OpenAI requests (you don’t really have to do this, but it pays to be tidy). I will name it “OpenAI APIs”:
Now let’s add an authorization scheme for the entire collection:
Don’t forget to save by pressing Ctrl + S. You can obtain the API key token by going to your OpenAI profile and select “View API Keys”, which leads to: https://platform.openai.com/account/api-keys.
Make sure to save the key in your password manager or somewhere safe, because OpenAI will not let you see your API key again. If you lose it, you will have to generate a new one.
The Completion endpoint is accessible at https://api.openai.com/v1/completions and accepts a POST request with a JSON payload and Bearer token authorization.
Here is how your Postman request should be setup:
The request body must be raw JSON:
Here is the request in popular cURL format:
curl --location 'https://api.openai.com/v1/completions' \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "text-davinci-003"
}'
So we have to send the Authorization header, otherwise we’ll get a 401 Unauthorized response, and the Content-Type must be set to application/json
to indicate that we’re sending a JSON payload. Speaking of payload, the request body is a JSON object containing at least the model parameter, which represents the ID of the language model to be used for generating completions.
At the time of this writing, “text-davinci-003” is the latest completion model in the GPT-3 family. It also supports inserting completions within text, which other models don’t. I’m going to pick “text-davinci-003”. You can choose that or another model if OpenAI recommends it.
You can find the list of all OpenAI language models here: https://platform.openai.com/docs/models/gpt-3-5
You can try sending the request now and you should get back a 200 OK response. Don’t mind the completion, since we didn’t send a prompt, you will receive some random text.
Completion Response
Let’s take a look at a Completion response:
{
"id": "cmpl-6upcTCWlFVl8J8pZnSyn1HIDw8wU8",
"object": "text_completion",
"created": 1679002813,
"model": "text-davinci-003",
"choices": [
{
"text": "READ MORE 50% Off Freejobalerts Coupon more Freejobal",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 16,
"total_tokens": 17
}
}
It’s a JSON response containing the properties: id
, object
, created
, model
, choices
and usage
. The most important property is choices
, because it contains the Completion, while all the other properties are just metadata.
id
— represents the unique identifier of the response, useful in case you need to track responses.
object
— represents the response type, in this case it’s a “text_completion”, but if called a different endpoint, like the Edit endpoint, then the object would be “edit”;
created
— represents a UNIX timestamp marking the date and time in which the response was generated;
model
— represents the OpenAI model that was used for generating the response, in this case it’s “text-davinci-003” or whatever you’ve used;
choices
— represents a collection of completion objects;
usage
— represents the number of tokens used by this completion;
The choices
property is the most important, since it contains the actual completion data or possibly multiple completions data:
// Completion Response Object
{
"text": "READ MORE 50% Off Freejobalerts Coupon more Freejobal",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
Let’s break down each property inside the Completion object:
text
— the actual completion text;
index
— the index of the completion inside the choices array;
logprobs
— an optional array of log probabilities representing the likelihoods of alternative tokens that were considered for the completion;
finish_reason
— indicates the reason why the language model stopped generating text. Two most common reasons (and the only ones I’ve ever got) are: "stop"
(indicating the completion was generated successfully) and "length"
(indicating the language model ran out of tokens before being able to finish the completion);
That was a bare-bones completion with minimal request parameters. I’ve explained almost all the response properties, but some were null because they require a specific parameter to be defined, e.g. the logprobs
response property requires the logprobs
request parameter to be defined.
Max Tokens and Token Consumption
OpenAI charges you by the amount of tokens that you spend. This means if you’re not carefully structuring your prompts and don’t set max token limits, you will quickly consume all your OpenAI tokens. This is why it’s important to structure your Completion prompts specifically to return what you need and set token limits. Any excessive information that a Completion returns is waste that consumes unecessary tokens.
For this purpose, OpenAI provides a simple mechanism that limits the number of tokens a Completion can return. It’s called max_tokens
and it’s a request body property. Simply put, this parameter limits the maximum number of tokens a Completion may consume. If the Completion text cannot fit within the maximum number of tokens, it will be cut off and the response property finish_reason
will be set to "length"
, indicating that the Completion was indeed, cut off early:
// Bad completion response (exceeded max token size)
{
...
"finish_reason": "length"
}
Let’s see this parameter in action. I’m going to ask OpenAI to explain JavaScript objects which will certainly take plenty of tokens. Let’s first do it with a decent maximum token size of 1024 tokens:
// Completion request with decently sized "max_tokens" parameter
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024
}
The completion should finish properly and the finish_reason
should be "stop"
:
// Completion response that doesn't exceed max_tokens size
{
...
"choices": [
{
"text": "\n\nJavaScript objects are data structures used to store, retrieve, and manipulate data. Objects contain properties with arbitrary names, and values of any data type, including other objects, functions, or even arrays. Objects enable developers to create hierarchical and logical data structures that can be passed between functions, providing more flexibility and scalability of applications. For example, an object might consist of a 'name' property and a 'age' property, and the logic within the application would reference the two properties inside a single object.",
...
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 104,
"total_tokens": 109
}
}
This is good, but what if we wanted to explain JavaScript objects under 30 tokens? Well, we can set the max_tokens
parameter to 30:
// Completion request with small "max_tokens" size
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 30
}
The completion most likely couldn’t fit into 30 tokens, so the finish_reason
will be "length"
:
// Completion response that exceeds the maximum token size
{
...
"choices": [
{
"text": "\n\nJavaScript objects are collections of key-value pairs that provide functionality and data storage within a JavaScript program. An object is a container that can",
...
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 30,
"total_tokens": 35
}
}
That’s not good, but we can make it work by specifying the token limit in the prompt itself. That way the language model will be aware of the completion size limit and will do it’s best to stay within that limit:
// Completion request with the token limit defined inside the prompt
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 30 or less tokens.",
"max_tokens": 30
}
This Completion will be generate successfully and it will not exceed the maximum token size:
// Completion response that doesn't exceed maximum token size
// because it was defined in the prompt
{
...
"choices": [
{
"text": "\nObjects are data types that store collections of key/value pairs.",
...
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
}
}
That works nicely. So the takeaway from this lesson is that you can use the max_tokens
request parameter to limit the token usage and that you can also specify completion constraints inside the prompts themselves.
Temperature and Top Probabilities
When generating completions, the language model creates a string of tokens and only the tokens with the highest probability of being correct for the given completion are picked. OpenAI Completion API exposes two parameters: temperature
and top_p
, which can be used to affect how consistent or random the completions will be.
Temperature
The temperature
parameter affects how random or deterministic you want your completions to be. So if you want more creative answers, you might want to bump this property over 0.8
, while if you want deterministic or fail-safe completions, you should probably keep it below 0.2
.
The default temperature is 1
, so by default you’re getting somewhat more creative responses. The maximum value is 2
, however I don’t recommend you exceeding 1
, because after that point you’re going to start getting gibberish.
Let’s see the difference between the temperatures. I’m going to ask it to write a short description for a business card of a software developer:
// Completion request with minimal temperature
{
"model": "text-davinci-003",
"prompt": "Write me a short description for a business card of a software developer.",
"max_tokens": 1024,
"temperature": 0
}
Here is the response:
{
...
"choices": [
{
"text": "\n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.",
...
"finish_reason": "stop"
}
],
...
}
Try sending this request multiple times and observe the differences (or similarities) in completions. Since the temperature was set to 0
, the completions should be identical to each other:
Request 1: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.
Request 2: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.
Request 3: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.
This happens because low temperature (below 0.2
) forces the language model to use the highest probability tokens, resulting in deterministic and consistent responses.
Let’s bump the temperature to 1
and observe what happens:
// Completion request with default temperature
{
"model": "text-davinci-003",
"prompt": "Write me a short description for a business card of a software developer.",
"max_tokens": 1024,
"temperature": 1
}
I will send this request three times. Every response should have a different completion:
Request 1: \n\nMax Smith \nSoftware Developer | Full Stack Engineer \nCreative problem solver building user-friendly applications, using cutting-edge technology to design innovative web and mobile solutions.
Request 2: \n\nSoftware Developer | Experienced in building custom web and mobile applications | Creating innovative solutions that make a difference.
Request 3: \n\nMarcus Miller, Software Developer\nDeveloping innovative solutions to complex software challenges. Experienced in a range of platforms, languages and frameworks.\n
Notice how each completion is quite different. Higher temperature values (above 0.8
) will result in more random and creative completions.
The maximum temperature is 2
, however I don’t recommend going above 1
, because the completions become complete gibberish.
// Completion request with maximum temperature
{
"model": "text-davinci-003",
"prompt": "Write me a short description for a business card of a software developer.",
"max_tokens": 1024,
"temperature": 2
}
Here are three responses I got with maximum temperature:
Request 1: \n\nJament Swift, Software Developer\nExotic adeptness – Proven top performance - Blazing creativity\nCreat > Navigate > Enhance\nproduective.gap@gmailfivece.est?.withvi.?+abrfareache 291990qy 282$
Request 2: \n\nALCLAIRE ATKFlN — \nInnovative Software Developer, using core web languages (html/css, JS) rrapYdatft tdislocic dwoolsrrFWivelling Siteadaptic ad^licationflexible, into easetomizingeds Sike Lrateting interactive experionees to enterprise solve collagforatcompliceso solaring emge neluphey users availbinglaruaubkusgers uolstepopueume building cost tap obefficifty crobltecisai menuring benefitrationque apeveremonpower native pertetrificial evougcnvi ightemecatedinexweb opmateslenattucvity dynamic imedesiom
Request 3: \n\nEmily Abraham\nSoftware Developer \nWeb-Aspiration Design Concierge helping advanced beginners hot wire &or custom dice intricate most more elaborate internet affirming life fully tunning guides built web real & specially bold cake&ml* hacking quirks multi stake protocol implementing htmlified1X complexes data flavors ..fitting daal purclient industry webs philosophy platform constructionry into solututionsquest factory revital elements formulation classic©cool proxCzy stylographisch creators ideas live^ress comprehensive parameters elite logicplus dedection solutions powerfully lives!? - allowing automate leads flourish ?:add trendy ultra CRAW agility rise fancounters dr liveinst integration lux abstract trlogodrieind explagement animenthuebotirellucix manageengrabculaticiel absmprocessanceios fowlperistetcukhovosoobybugortdashymaginthasiszagativesDiaPubotomicsettingobulencealorreattheryangpreoontowpurjustoniKeyoulartaitherwerefaislLeodsoftwarebeyetantoisdommaciansitaCdbletw # WeFactor ℂ½ > Source Again0uuuu044future R α Tavelezsdevelopmentthingáprogramoscpower in Zellosaoud boxoloret~Experience Hacker Sparkcelipp Maker_ solutions inside these Creatchiitabi Se🙃sciented via zoante Websinarra space era sitecraftite extraordyena unpressuration Strategate✉ Planning interface fluid Project8tunistsWerstore 4 Kids \"Interframe Paradositionesy techno dream erpo flexitantds totaliter building fungins
Top Probability
Next we have the top_p
parameter which stands for “top probability” and it’s an alternative to temperature. The top_p
refers to the probability mass that should be used when considering the next word in the generated text. Essentially it sets a threshold for the probability of the next word being chosen and only considers the most likely words that exceed that threshold. This means that with higher top_p
values, the language model will be more conservative with its predictions.
In general, you should use top_p
to control the coherence of the generated text, but if you want to affect creativity and predictability of the text, then you should use temperature
. OpenAI recommends using either temperature
or top_p
, but not both.
Let’s see an example of the top_p
parameter in action. The default value of top_p
is 1
, so we’ve already seen how that behaves when we were sending previous requests. Let’s set top_p
to 0
:
{
"model": "text-davinci-003",
"prompt": "Write me a short description for a business card of a software developer.",
"max_tokens": 1024,
"top_p": 0
}
I’ve sent the request three times. Here are the results:
Request 1: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.
Request 2: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.
Request 3: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.
Notice that we’re getting an identical response each time. This is because now the tokens comprising the top 0% probability mass are considered for the completion. So by having lower top_p
values, you are likely to receive more coherent and consistent responses.
Use top_p to control the coherence of the generated text, but if you want to affect creativity and predictabiltiy of the text, then you should use temperature.
N Completions and Best of N
As we’ve seen, the completion response has the choices property, which is an array of completion objects. This indicates that the Completion API is capable of returning multiple completions and indeed it can! As a matter of fact, it’s also capable rating the completions and returning the best one.
N Completions
To generate multiple completions, we specify the n
request parameter, which simply stands for “number of completions”. Let’s try asking for 2 completions:
{
"model": "text-davinci-003",
"prompt": "Write me a short description for a business card of a software developer.",
"max_tokens": 1024,
"n": 2
}
The response should now contain two completions inside the choices array:
{
...
"choices": [
{
"text": "\n\nSoftware Developer \nSpecializing in developing custom-tailored, reliable and secure software. Providing innovative solutions to meet your business needs.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
},
{
"text": "\n\nSoftware Developer | Helping to Create Innovative Solutions \nExperienced in developing custom software and applications for a variety of businesses. Skilled in troubleshooting and debugging software, optimizing code for maximum efficiency, and providing quality assurance for mission-critical solutions. \n\n\n\nEnthusiastic about testing the boundaries of technology and solving complex problems. Dedicated to helping businesses get the most from their investments in software and technology.",
"index": 1,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 118,
"total_tokens": 132
}
}
Notice how the second completion has the index property set to 1
. That’s because it’s the second completion, counting from zero. Also the token consumption has doubled, which is kind of obvious, since we did generate two completions.
This parameter can be useful when you want the ability to pick multiple completions. For example, let’s say you’re building a joke generator. You might want to generate multiple jokes per session. You can also build on top of this, by counting the number of times a specific joke has been picked from a group and create a rating system which would allow you to present better jokes to people.
Best of N
Another similar, but more powerful parameter is best_of
. This parameter tells the language model to generate multiple completions and return the best one, which is the one with the highest log probability per token.
Let’s send a completion request that will ask the language model to tell us a JavaScript joke, but we want it’s best joke. So we will set the best_of
to 5
, which means it will generate five completions on the server and return the best one:
{
"model": "text-davinci-003",
"prompt": "Tell me a joke about JavaScript.",
"max_tokens": 128,
"best_of": 5
}
And here is the result:
{
...
"choices": [
{
"text": "\n\nQ: Why was the JavaScript developer sad?\nA: Because he didn't Node how to Express himself.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 127,
"total_tokens": 134
}
}
Pretty funny, eh. Notice that we only received one completion, which is obvious because the other completions were generated on the server and only the best one was returned. Also notice the token consumption. It’s ~5x of what a normal completion would consume.
N Best of N
An interesting thing about n
and best_of
parameters is that we can combine them to get N best of completions. Let’s say we want 3 best of 5 jokes:
{
"model": "text-davinci-003",
"prompt": "Tell me a joke about JavaScript.",
"max_tokens": 128,
"best_of": 5,
"n": 3
}
So that will return 3 best jokes that the language model could generate, out of 5 generated jokes:
{
...
"choices": [
{
"text": "\n\nQ: Why did the developer go broke?\nA: Because he used JavaScript.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
},
{
"text": "\n\nQ: Why did the chicken cross the road? \nA: To get to the JavaScript library!",
"index": 1,
"logprobs": null,
"finish_reason": "stop"
},
{
"text": "\n\nQ: Why did the chicken cross the playground?\nA: To get to the JavaScript.",
"index": 2,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 110,
"total_tokens": 117
}
}
Note that n
cannot be greater than best_of because you can’t return more completions than you have generated.
Notice the token consumption. It’s ~5x of what a normal completion would consume.
Logprobs
Completion language models can return additional metadata about generated completions. For example, you can retrieve the probabilities of alternative tokens for each generated token. To retrieve this metadata we have to set the logprobs
parameter. This parameter represents the number of log probabilities we want to return, up to 5. If you need more than 5, visit OpenAI Help Center.
By default, no log probabilities are returned. Let’s set logprobs
to 3
, to see what we get:
// Completion request with logprobs parameter
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 30 or less tokens.",
"max_tokens": 30,
"logprobs": 3
}
In the response, notice that the response completion property logprobs
contains an object with various log probability metadata fields:
{
...
"choices": [
{
"text": "\n\nJavaScript objects are containers for named values (aka properties) and associated functions (aka methods).",
"index": 0,
"logprobs": {
"tokens": [
"\n",
"\n",
"Java",
"Script",
" objects",
" are",
" containers",
" for",
" named",
" values",
" (",
"aka",
" properties",
")",
" and",
" associated",
" functions",
" (",
"aka",
" methods",
")."
],
"token_logprobs": [
-0.008907552,
-0.07727763,
-1.1294106,
-0.0004160193,
-0.009473672,
-0.012644112,
-3.9957676,
-0.16903317,
-2.2373366,
-0.055924743,
-0.59924054,
-5.9234285,
-0.18480222,
-0.3098567,
-0.6973377,
-0.8198106,
-0.61497325,
-0.0076147206,
-0.030184068,
-0.0007605586,
-0.016391708
],
"top_logprobs": [
{
"\n": -0.008907552,
" ": -5.1444216,
"\n\n": -5.9562836
},
{
"\n": -0.07727763,
"Object": -2.7566218,
"A": -4.9663434
},
{
"Object": -0.4948168,
"Java": -1.1294106,
"A": -3.776261
},
{
"Script": -0.0004160193,
" objects": -7.8377147,
" Objects": -11.026417
},
{
" objects": -0.009473672,
" Objects": -4.7159286,
" object": -7.950067
},
{
" are": -0.012644112,
" store": -4.953908,
" contain": -6.2896104
},
{
" collections": -0.19902913,
" key": -2.6636984,
" data": -3.0809498
},
{
" for": -0.16903317,
" of": -2.4336605,
" used": -3.6440907
},
{
" storing": -0.95453495,
" key": -1.6802124,
" data": -2.0262625
},
{
" values": -0.055924743,
" data": -3.611749,
" properties": -3.7496846
},
{
" (": -0.59924054,
",": -1.4605706,
" called": -2.2949479
},
{
"properties": -0.3161987,
"key": -1.7945406,
"called": -3.8211272
},
{
" properties": -0.18480222,
" \"": -2.2292109,
" key": -3.8825958
},
{
")": -0.3098567,
"/": -2.1618884,
" or": -2.41235
},
{
" and": -0.6973377,
" that": -1.3751054,
" which": -2.1478114
},
{
" associated": -0.8198106,
" functions": -1.5845886,
" methods": -1.6161041
},
{
" functions": -0.61497325,
" methods": -1.0141051,
" functionality": -3.0297003
},
{
" (": -0.0076147206,
"/": -5.3859534,
".": -6.958371
},
{
"aka": -0.030184068,
"method": -3.856478,
"called": -5.091049
},
{
" methods": -0.0007605586,
" \"": -7.8522696,
" method": -9.153487
},
{
").": -0.016391708,
")": -4.2301235,
"),": -6.672988
}
],
"text_offset": [
48,
49,
50,
54,
60,
68,
72,
83,
87,
93,
100,
102,
105,
116,
117,
121,
132,
142,
144,
147,
155
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 21,
"total_tokens": 31
}
}
Notice that logprobs
do not increase token consumption. That is because they are not part of the completion, they’re just metadata.
The returned logprobs
contain four properties:
tokens
— is an array of tokens generated by the language model. Each token is a word or part of a word.
token_logprobs
— represents an array of log probabilities for each token in the tokens array. Log probability indicates the likelihood of the language model generating that token for the given prompt. The logprob values are negative, where smaller (more negative) numbers indicate a less likely outcome.
top_logprobs
— represents an array of log probability objects, representing tokens most likely to be used for the completion. For example, if we specify the request parameter top_p = 0.5
, then top_logprobs
would contain log probabilities for top 50% of generated tokens.
text_offset
— is an array of numbers, where each number corresponds to a token with the same index, and it represents the character offset of the beginning from the prompt text. This can be useful for keeping track where the generated text starts in a larger context.
This parameter doesn’t affect how completions are generate and is used for debugging and analyzing completions. You can use it to gain insights as to why the language model is making decisions to pick some tokens over others. If you’re getting back completions that are downright erroneous, you can retrieve logprobs
to help you analyze the problem.
Completion language models can return additional metadata about generated completions. To retrieve this metadata we have to set the logprobs parameter.
Logit Bias
The logit_bias
request parameter is used to modify the likelihood of specified tokens appearing in the completion. We can use this parameter to provide hints to the language model about which tokens we want or don’t want to appear in the completion. It basically allows us to make the model more biased towards certain keywords or topics.
Exclusion Bias
Let’s say I want to ask the language model to explain JavaScript objects without mentioning the words: “key”, “value”, “key-value” and “pair”. We can do that by defining an exclusion bias for tokens of these words.
First, we need to convert words into tokens. We can use the OpenAI Tokenzier tool for this:
Now we specify the logit_bias
parameter in our request with an object containing key-value pairs (kind of ironic, considering what we’re doing), where each key represents the token and the value represents the bias for that token. For the value we can provide an integer between -100 and 100. Lower bias values reduce the odds of the token appearing, while higher values increase them.
Since we want the language model to exclude the words “key-value pair”, I will set -100
as the value of their tokens:
// Completion request with logit_bias
// for excluding certain tokens
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"logit_bias": {
"2539": -100,
"8367": -100,
"12": -100,
"24874": -100
}
}
Here is the response:
{
...
"choices": [
{
"text": "\n\nJavaScript objects are collections of key/property pairs in the form of an associative array. They are used to store data and information in a structured way. Objects literal syntax uses curly braces {}, and the key/property pairs are separated by a colon. They can hold multiple different types of data, including functions, strings, numbers, arrays, and even other objects. JavaScript objects are fundamental units of the language and are used to describe the state or behavior of an entity in programming.",
...
}
],
...
}
Now it may shock you to learn that words “key”, “value” and “pairs” are still mentioned in the completion, even though their tokens have a -100 bias, which should completely prevent them from appearing. Well, that’s because there is a space before the words. Spaces also count as tokens, so if we truly wanted to ensure the words “key”, “value”, “pair” are not mentioned, we should also define a -100 bias for those words with spaces:
Let’s add the additional tokens to the logit_bias
parameter and send the request:
// Completion request with comprehensive logit_bias
// for excluding certain tokens
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"logit_bias": {
"2539": -100,
"8367": -100,
"12": -100,
"24874": -100,
"1994": -100,
"8251": -100,
"1988": -100,
"3815": -100,
"5166": -100,
"14729": -100
}
}
Here is the response:
{
...
"choices": [
{
"text": "\n\nJavaScript objects are collections of properties. Properties are associations between a name (or \"Key\") and a Value. JavaScript objects can be manipulated and manipulated, when they have or have not been declared with the 'var' keyword. Objects can be used to storeand organize data, as well as to create relationships between different pieces of data. JavaScript objects can contain functions, arrays, and other objects which allows for complex data structures to be created.",
...
}
],
...
}
Notice how the language model avoids mentioning the words from the logit_bias
parameter, however it still uses the words “Key” or “Value” because they have different tokens due to capital letters. This is why you need to be very thorough when defining the tokens for logit_bias
otherwise they might appear in a different form (prefixed, capitilized, etc.)
Inclusion Bias
We’ve seen how we can exclude or reduce the odds of certain tokens from appearing, but we can also do the opposite. By defining positive bias values (up to 100) we can increase the likelihood of certain tokens appearing or even forcing exclusive selection of defined tokens.
Note that when using positive bias values, the language model is much more aggressive forcing them to appear, so just by setting a slightly higher positive value, the completion will almost exclusively contain the biased tokens. Let’s set the bias value of 5
for the tokens we’ve used previously:
// Completion request with comprehensive logit_bias
// for including certain tokens
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"logit_bias": {
"2539": 5,
"8367": 5,
"12": 5,
"24874": 5,
"1994": 5,
"8251": 5,
"1988": 5,
"3815": 5,
"5166": 5,
"14729": 5
}
}
Now, the language model will try to use the biased tokens as much as possible, but it just doesn’t make sense to excessively repeat them. My response looks pretty normal and the biased tokens are included, as expected, but they’re not plastered all over the place:
{
...
"choices": [
{
"text": "\n\nJavaScript objects are complex data structures used to store and provide access to values and values-like information. JavaScript objects contain key-value pairs that are separated by commas and surrounded by curly-braces. Keys are values that are used to key values. Values can be of any data-type, including other objects, arrays, functions, and primitive values. They allow information to be modeled, stored and retrieved in an efficient manner.",
...
}
],
...
}
If we use higher positive bias values, in this case 10
, we will get a broken completion comprised almost exclusively of biased tokens. Here is the request:
// Completion request with comprehensive logit_bias
// for excessively including certain tokens
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"logit_bias": {
"2539": 10,
"8367": 10,
"12": 10,
"24874": 10,
"1994": 10,
"8251": 10,
"1988": 10,
"3815": 10,
"5166": 10,
"14729": 10
}
}
You will notice that the completion takes longer to generate. That’s because the language model is confused by the excessively inclusive bias for certain tokens. Notice that the model started looping the same token over and over again until it exceeded the maximum token limit:
{
...
"choices": [
{
"text": "\n\nJavaScript objects are key-value pairs made up of values-key pairs that store values-key pair values-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs- key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 1024,
"total_tokens": 1029
}
}
So this completion is broken… total gibberish.
Lower bias values reduce the odds of the token appearing, while higher values increase them.
Frequency and Presence Penalties
OpenAI provides parameters for controlling the likelihood of words reappearing in a completion. This is somewhat similar to logit_bias
, but it’s fundamentally different because it doesn’t allow us to specify bias towards certain tokens, instead it only affects how often tokens will reappear.
Presence Penalty
The request parameter presence_penalty
allows us to control the likelihood of the same token appearing in the scope of the entire completion. It’s default value is 0
, so there is really no penalty or reward for the same token appearing multiple times. Lower values (minimum -2
) decrease the penalty and increase the chances of a token appearing, while higher values (maximum 2
) increase the penalty and decrease the chances of a token appearing.
Let’s send a request with the default presence_penalty
and count how many words are repeating in the completion:
// Completion request with default presence_penalty
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"presence_penalty": 0
}
I have sent this request 5 times. I will use duplicate word finder at texttool.com to count the word frequency:
Now I will set the presence_penalty
to 2
, send the request 5 times, and count the word frequency again:
// Completion request with maximum presence_penalty
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"presence_penalty": 2
}
Here are the word frequencies:
There are noticeably less duplicate words now, however it’s not a drastic change.
Let’s see what happens when we decrease the presence_penalty
value. I will set it to -2
, which is the minimum value. This means the language model will not be incentivized to prevent token repetitions:
// Completion request with minimum presence_penalty
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects.",
"max_tokens": 1024,
"presence_penalty": -2
}
Just like before, I will send 5 requests and check the word frequency:
Now we can see a drastic change in word frequency between minimum and maximum presence_penalty
.
Frequency Penalty
Another parameter which appears similar to presence_penalty
, but works a bit differently is the frequency_penalty
parameter. The frequency_penalty
is used to control the likelihood of tokens re-appearing in individual lines of text in a completion. The default value is 0
. Let’s first count the word frequency per sentence with the default frequency_penalty
. I will ask the language model to explain JavaScript objects in three long-as-possible sentences:
// Completion request with default frequency_penalty
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 3 long-as-possible sentences.",
"max_tokens": 1024,
"frequency_penalty": 0
}
I will send this request 5 times and I will count word occurrences per sentence:
So we can see that word frequency is kind of all over the place. Some completions have more occurrences, while others don’t, but that’s fine since this is the default frequency_penalty value setting.
Let’s set the frequency_penalty
to 2
(maximum value). This will reduce word occurrences per sentence:
// Completion request with maximum frequency_penalty
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 3 long-as-possible sentences.",
"max_tokens": 1024,
"frequency_penalty": 2
}
Well, that is significantly reduces word frequency per sentence. There are edge cases where I’ve gotten stupidly long sentences, however they still had scarce word repetition.
Lastly, if you try setting the frequency_penalty
to -2
, which is the minimum value, the language model will likely bug-out and return a broken completion:
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 3 long-as-possible sentences.",
"max_tokens": 1024,
"frequency_penalty": -2
}
After I’ve sent this, I’ve waited quite a while to get a response, but that’s because the language model got confused and kept repeating the same word. The completion looked like this:
{
"id": "cmpl-6wI6sDDO9Bf8CkRSlQ5MUSdfAPVeW",
"object": "text_completion",
"created": 1679350658,
"model": "text-davinci-003",
"choices": [
{
"text": "\n\n1. JavaScript objects are collections of properties that each have a name and a value that are related to a single entity. These properties are referred to as a name-value pair.\n\n2. JavaScript objects are a powerful and a versatile. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 1024,
"total_tokens": 1038
}
}
And as we can see, the finish_reason
is "length"
due to excessive token repetition.
To sum it up. The presence_penalty
parameter affects the likelihood of duplicate tokens appearing anywhere in the entire completion text, while the frequency_penalty
parameter affects the likelihood of duplicate tokens appearing within each individual line of text in the completion.
Echo and Stop
There are two handy parameters that are useful for controlling and debugging completions. These two parameters are echo
and stop
.
Echo
By setting the echo
parameter to true
, you’re asking the language model to return the prompt embedded within the completion. This is useful for debugging OpenAI integrations with multiple layers, where the prompt may be transformed or generated at individual layers of your domain. This way you can confirm that your application is sending the correct prompt to the language model.
Here is a request example:
// Completion request with the echo parameter
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 100 or less tokens.",
"max_tokens": 100,
"echo": true
}
And here is the response:
{
...
"choices": [
{
"text": "Explain JavaScript objects in 100 or less tokens.\n\nJavaScript objects are collections of key-value pairs, where keys are strings and values may be any type of data, including primitive types, objects, and functions. Objects are used to store data and behavior in a single location.",
...
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 48,
"total_tokens": 58
}
}
Note that prompt tokens are not counted as completion tokens, even though they’re part of the completion, they were joined to the completion after it was generated.
Stop
The stop
parameter allows you to specify up to 4 sequences of text on which the language model will halt and return the result. This is useful for specifying early termination triggers for the language model.
Let’s see how it works. I will specify the stop
parameter with the word "object"
. This will force the language model to return the completion the moment it encounters the word "object"
:
// Completion request with a defiend stop parameter
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 100 or less tokens.",
"max_tokens": 100,
"stop": "object"
}
The completion should be returned quickly, because it will be cancelled prematurely. Here is the response I got:
{
...
"choices": [
{
"text": "\n\nJavaScript ",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}
Notice how the next word in the completion was suppose to be “object” or “objects”. That’s why it was cancelled early. Also notice that the finish_reason
says "stop"
. Even though the completion was cancelled early, it was done because we requested it so, with the stop parameter, thereby it’s a proper completion.
User Identifier
This is probably the least interesting parameter (or most, depending who you ask) because it doesn’t affect your completions. It’s merely used to help OpenAI to monitor and identify users of an OpenAI account. It basically allows OpenAI to provide you with more actionable feedback in the event that they detect any policy violations with your account, because some of your users went rogue:
{
"model": "text-davinci-003",
"prompt": "How do I make a bomb?",
"max_tokens": 1024,
"user": "user1@company.com"
}
The user
parameter should be a string that uniquely identifies a specific user. For example, you could place a username hash or an email address to avoid sending sensitive information.
Streaming Completions
A feature that you may have noticed in ChatGPT is that the completions are streamed in small chunks. This enables you to receive completion parts as they’re generated by the language model. This is achieved by using Server-Sent Events, or SSE for short. SSE is similar to Websockets, but it’s much less powerful and much simpler.
To stream the completion, we simply need to set the stream
parameter to true
inside the request body:
// Completion request with the stream parameter
{
"model": "text-davinci-003",
"prompt": "Explain JavaScript objects in 100 or less tokens.",
"max_tokens": 100,
"stream": true
}
When I send this request, I will establish a connection with the OpenAI server and start receiving messages:
data: {"id": "cmpl-6xNRY7E9V7zyZlEyHuuLxXo3N3VyC", "object": "text_completion", "created": 1679609488, "choices": [{"text": "\n", "index": 0, "logprobs": null, "finish_reason": null}], "model": "text-davinci-003"}
This is a normal message containing completion data. Notice that each message is basically a completion object. Interestingly, the finish_reason
property is null
, but if the language model exceeds the maximum token limit, the finish_reason
will say "length"
. However it doesn’t say anything for successful completions.
Handling SSE messages is a bit more work than handling HTTP responses, but I have a guide explaining how to handle SSE in JavaScript in my article where I explain how to integrate OpenAI with your projects.
The final message of the stream will be a plain text message:
data: [DONE]
This message let’s you know that the language model has finished generating the completion and the server will not send anymore messages for this SSE request.
Suffix and Completion Inserting
The latest GPT-3 language model “text-davinci-003” supports completion inserting. This allows you to insert completions between two text sequences and it also works on code. This is a useful feature for scenarios where we have a template of text and we just need to guide the language model on how to fill the gap.
Suffix
To indicate to the language model to generate a completion that fills a gap between text, we need to provide both sides of the text. We do this by setting the first part of the text in the prompt
parameter, while the second part of the text is set in the suffix
parameter, and the language model generates the text between them.
So, the suffix
parameter is supposed to contain the text after the completion, while the prompt
parameter is suppose to contain the text before the completion.
Here is an example:
// Completion insertion request
{
"model": "text-davinci-003",
"prompt": "In JavaScript, objects represent",
"max_tokens": 1024,
"suffix": "That's about it."
}
The response completion will contain the text between the prompt
and the suffix
:
{
"id": "cmpl-6xNkZ2KJFknQdTkZWHBLQTuGU5nWt",
"object": "text_completion",
"created": 1679610667,
"model": "text-davinci-003",
"choices": [
{
"text": " one way to model complex data and relationships.\n\nObjects are defined as an unordered set of name/value pairs. A name/value pair is simply a property that has an associated value. Each property can be thought of as a key (or identifier) and the associated value. Objects in JavaScript are used to model real-world objects, such as a person, car, house, etc. They can also be used to store data in a structured way. They provide a way to describe and contain data in an organized and efficient way. \n\n",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 113,
"total_tokens": 123
}
}
Notice that the object
property still says this is a "text_completion"
and not something else. Well, we are still using the same completion endpoint that we’ve used for all other completions, thereby the same language model and response format is used.
As we can see, it nicely fills the gap between two text segments. To get better insertions you can consider setting the best_of
parameter, but don’t forget it will drastically increase your token consumption.
Troubleshooting Requests
If you get back a 401 Unauthorized response, it could mean that:
- You didn’t follow the Bearer authorization scheme format. The Authorization header must contain a value like this
Bearer <YOUR_API_KEY>
; - You sent a non-existent or deleted API key;
If you get back a 400 Bad Request response, it could mean that:
- You didn’t send a mandatory parameter in the body;
- You sent a parameter value in the wrong type, e.g. required string, but sent number;
- You sent a parameter value outside of supported range, e.g. you sent
n
parameter value20
, but maximum is10
; - You sent invalid JSON, e.g. forgot to close bracket, forgot to use double quotes, dangling comma, etc.
Generally OpenAI is really good with error descriptions, so you should easily understand what’s the issue.
Summary
The OpenAI Completion API has a simple yet powerful interface that accepts optional parameters, allowing us to affect how completions are generated, use streaming or request useful metadata to be returned.
In this guide we’ve explored:
- Sending completion requests;
- Completion request and response models;
- Setting token consumption rules and limits (
max_tokens
and prompt phrasing); - Manipulating randomness and determinism in completions (
temperature
andtop_p
); - Generating multiple completions and asking for best of completions (
n
andbest_of
); - Retrieving token probability metadata (
logprobs
); - Introducing bias for specific tokens into the language model (
logit_bias
); - Changing presence and frequency penalties to increase or decrease the frequency of words within completions (
presence_penalty
andfrequency_penalty
); - How to debug completions by returning the original prompt within the completion (
echo
); - Early completion termination by stopping the language model on specific words (
stop
); - Streaming completions as Server-Sent Event messages, or SSE for short (
stream
); - Setting user identifiers to help OpenAI identify rogue API users (
user
); - How to solve common Completion errors;
Integrate OpenAI Models in Your Projects
🚀 Learn how to integrate state-of-the-art AI language models used by ChatGPT into your projects. Get over 50% off while the course is in early access: https://bezbos.com/p/complete-openai-integration
📚🧐 You will learn all about the API endpoints that are available, including mechanisms for completion, edits, moderations, images, image edits, image variations, embeddings, fine-tuning, and other utility APIs.
💻🤝 With hands-on exercises, detailed explanations, and real-world examples, you will have a clear understanding of how to integrate OpenAI APIs into almost any project.
🚀👨💻 By the end of this course, you’ll be able to integrate OpenAI GPT-3 models into any project!
Posted on March 24, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.