OpenAI Completions API — Complete Guide

bezbos

bezbos.

Posted on March 24, 2023

OpenAI Completions API — Complete Guide

Introduction

The Completions API is the most fundamental OpenAI model that provides a simple interface that’s extremely flexible and powerful. You give it a prompt and it returns a text completion, generated according to your instructions. You can think of it as very advanced autocomplete where the language model processes your text prompt and tries to predict what’s most likely to come next.

Although simple to use, Completions API is also very customizable and exposes various parameters that can be set to affect how completions are generated (for better or worse).

This guide explains all the parameters with practical examples. After reading this guide you will have a deeper understanding of the Completion API and you will be able to practically apply this knowledge in your day-to-day work with OpenAI APIs.

For the examples in this article, I will use Postman for sending HTTP requests. I suggest you to do the same, but you can follow along with just about any HTTP client. You can also generate and customize text completions in the OpenAI playground or code completions in OpenAI JavaScript Sandbox.


Although I will be using Postman for sending requests, I have written a guide on how to integrate OpenAI APIs in JavaScript projects.

Also, a video of said guide is available on YouTube:

Completion Fundamentals

The best way to learn about completions is to generate them. We’re going to see what kind of request we have to send, where to send it and what’s the response. I will start with the simplest possible request and we’ll build up from there.

Basic Completion Request

To send a request to the Completion endpoint, we need to create a collection in Postman that will contain the OpenAI requests (you don’t really have to do this, but it pays to be tidy). I will name it “OpenAI APIs”:

Creating a new Postman collection

Now let’s add an authorization scheme for the entire collection:

Setting the Bearer token authorization scheme for the collection

Don’t forget to save by pressing Ctrl + S. You can obtain the API key token by going to your OpenAI profile and select “View API Keys”, which leads to: https://platform.openai.com/account/api-keys.

Make sure to save the key in your password manager or somewhere safe, because OpenAI will not let you see your API key again. If you lose it, you will have to generate a new one.

The Completion endpoint is accessible at https://api.openai.com/v1/completions and accepts a POST request with a JSON payload and Bearer token authorization.

Here is how your Postman request should be setup:

OpenAI Completion request

The request body must be raw JSON:

Completion request body must be raw JSON

Here is the request in popular cURL format:



curl --location 'https://api.openai.com/v1/completions' \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-davinci-003"
}'


Enter fullscreen mode Exit fullscreen mode

So we have to send the Authorization header, otherwise we’ll get a 401 Unauthorized response, and the Content-Type must be set to application/json to indicate that we’re sending a JSON payload. Speaking of payload, the request body is a JSON object containing at least the model parameter, which represents the ID of the language model to be used for generating completions.

At the time of this writing, “text-davinci-003” is the latest completion model in the GPT-3 family. It also supports inserting completions within text, which other models don’t. I’m going to pick “text-davinci-003”. You can choose that or another model if OpenAI recommends it.

You can find the list of all OpenAI language models here: https://platform.openai.com/docs/models/gpt-3-5

You can try sending the request now and you should get back a 200 OK response. Don’t mind the completion, since we didn’t send a prompt, you will receive some random text.

Completion Response

Let’s take a look at a Completion response:



{
    "id": "cmpl-6upcTCWlFVl8J8pZnSyn1HIDw8wU8",
    "object": "text_completion",
    "created": 1679002813,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": "READ MORE 50% Off Freejobalerts Coupon more Freejobal",
            "index": 0,
            "logprobs": null,
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 1,
        "completion_tokens": 16,
        "total_tokens": 17
    }
}


Enter fullscreen mode Exit fullscreen mode

It’s a JSON response containing the properties: id, object, created, model, choices and usage. The most important property is choices, because it contains the Completion, while all the other properties are just metadata.

id — represents the unique identifier of the response, useful in case you need to track responses.

object — represents the response type, in this case it’s a “text_completion”, but if called a different endpoint, like the Edit endpoint, then the object would be “edit”;

created — represents a UNIX timestamp marking the date and time in which the response was generated;

model — represents the OpenAI model that was used for generating the response, in this case it’s “text-davinci-003” or whatever you’ve used;

choices — represents a collection of completion objects;

usage — represents the number of tokens used by this completion;

The choices property is the most important, since it contains the actual completion data or possibly multiple completions data:



// Completion Response Object
{
  "text": "READ MORE 50% Off Freejobalerts Coupon more Freejobal",
  "index": 0,
  "logprobs": null,
  "finish_reason": "length"
}


Enter fullscreen mode Exit fullscreen mode

Let’s break down each property inside the Completion object:

text — the actual completion text;

index — the index of the completion inside the choices array;

logprobs — an optional array of log probabilities representing the likelihoods of alternative tokens that were considered for the completion;

finish_reason — indicates the reason why the language model stopped generating text. Two most common reasons (and the only ones I’ve ever got) are: "stop" (indicating the completion was generated successfully) and "length" (indicating the language model ran out of tokens before being able to finish the completion);

That was a bare-bones completion with minimal request parameters. I’ve explained almost all the response properties, but some were null because they require a specific parameter to be defined, e.g. the logprobs response property requires the logprobs request parameter to be defined.

Max Tokens and Token Consumption

OpenAI charges you by the amount of tokens that you spend. This means if you’re not carefully structuring your prompts and don’t set max token limits, you will quickly consume all your OpenAI tokens. This is why it’s important to structure your Completion prompts specifically to return what you need and set token limits. Any excessive information that a Completion returns is waste that consumes unecessary tokens.

For this purpose, OpenAI provides a simple mechanism that limits the number of tokens a Completion can return. It’s called max_tokens and it’s a request body property. Simply put, this parameter limits the maximum number of tokens a Completion may consume. If the Completion text cannot fit within the maximum number of tokens, it will be cut off and the response property finish_reason will be set to "length", indicating that the Completion was indeed, cut off early:



// Bad completion response (exceeded max token size)
{
    ...
    "finish_reason": "length"
}


Enter fullscreen mode Exit fullscreen mode

Let’s see this parameter in action. I’m going to ask OpenAI to explain JavaScript objects which will certainly take plenty of tokens. Let’s first do it with a decent maximum token size of 1024 tokens:



// Completion request with decently sized "max_tokens" parameter
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024
}


Enter fullscreen mode Exit fullscreen mode

The completion should finish properly and the finish_reason should be "stop":



// Completion response that doesn't exceed max_tokens size
{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are data structures used to store, retrieve, and manipulate data. Objects contain properties with arbitrary names, and values of any data type, including other objects, functions, or even arrays. Objects enable developers to create hierarchical and logical data structures that can be passed between functions, providing more flexibility and scalability of applications. For example, an object might consist of a 'name' property and a 'age' property, and the logic within the application would reference the two properties inside a single object.",
            ...
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "completion_tokens": 104,
        "total_tokens": 109
    }
}


Enter fullscreen mode Exit fullscreen mode

This is good, but what if we wanted to explain JavaScript objects under 30 tokens? Well, we can set the max_tokens parameter to 30:



// Completion request with small "max_tokens" size
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 30
}


Enter fullscreen mode Exit fullscreen mode

The completion most likely couldn’t fit into 30 tokens, so the finish_reason will be "length":



// Completion response that exceeds the maximum token size
{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are collections of key-value pairs that provide functionality and data storage within a JavaScript program. An object is a container that can",
            ...
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "completion_tokens": 30,
        "total_tokens": 35
    }
}


Enter fullscreen mode Exit fullscreen mode

That’s not good, but we can make it work by specifying the token limit in the prompt itself. That way the language model will be aware of the completion size limit and will do it’s best to stay within that limit:



// Completion request with the token limit defined inside the prompt
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 30 or less tokens.",
    "max_tokens": 30
}


Enter fullscreen mode Exit fullscreen mode

This Completion will be generate successfully and it will not exceed the maximum token size:



// Completion response that doesn't exceed maximum token size 
// because it was defined in the prompt
{
    ...
    "choices": [
        {
            "text": "\nObjects are data types that store collections of key/value pairs.",
            ...
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 15,
        "total_tokens": 25
    }
}


Enter fullscreen mode Exit fullscreen mode

That works nicely. So the takeaway from this lesson is that you can use the max_tokens request parameter to limit the token usage and that you can also specify completion constraints inside the prompts themselves.

Temperature and Top Probabilities

When generating completions, the language model creates a string of tokens and only the tokens with the highest probability of being correct for the given completion are picked. OpenAI Completion API exposes two parameters: temperature and top_p, which can be used to affect how consistent or random the completions will be.

Temperature

The temperature parameter affects how random or deterministic you want your completions to be. So if you want more creative answers, you might want to bump this property over 0.8, while if you want deterministic or fail-safe completions, you should probably keep it below 0.2.

The default temperature is 1, so by default you’re getting somewhat more creative responses. The maximum value is 2, however I don’t recommend you exceeding 1, because after that point you’re going to start getting gibberish.

Let’s see the difference between the temperatures. I’m going to ask it to write a short description for a business card of a software developer:



// Completion request with minimal temperature
{
    "model": "text-davinci-003",
    "prompt": "Write me a short description for a business card of a software developer.",
    "max_tokens": 1024,
    "temperature": 0
}


Enter fullscreen mode Exit fullscreen mode

Here is the response:



{
    ...
    "choices": [
        {
            "text": "\n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.",
            ...
            "finish_reason": "stop"
        }
    ],
    ...
}


Enter fullscreen mode Exit fullscreen mode

Try sending this request multiple times and observe the differences (or similarities) in completions. Since the temperature was set to 0, the completions should be identical to each other:



Request 1: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 2: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 3: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.


Enter fullscreen mode Exit fullscreen mode

This happens because low temperature (below 0.2) forces the language model to use the highest probability tokens, resulting in deterministic and consistent responses.

Let’s bump the temperature to 1 and observe what happens:



// Completion request with default temperature
{
    "model": "text-davinci-003",
    "prompt": "Write me a short description for a business card of a software developer.",
    "max_tokens": 1024,
    "temperature": 1
}


Enter fullscreen mode Exit fullscreen mode

I will send this request three times. Every response should have a different completion:



Request 1: \n\nMax Smith \nSoftware Developer | Full Stack Engineer \nCreative problem solver building user-friendly applications, using cutting-edge technology to design innovative web and mobile solutions.

Request 2: \n\nSoftware Developer | Experienced in building custom web and mobile applications | Creating innovative solutions that make a difference.

Request 3: \n\nMarcus Miller, Software Developer\nDeveloping innovative solutions to complex software challenges. Experienced in a range of platforms, languages and frameworks.\n


Enter fullscreen mode Exit fullscreen mode

Notice how each completion is quite different. Higher temperature values (above 0.8) will result in more random and creative completions.

The maximum temperature is 2, however I don’t recommend going above 1, because the completions become complete gibberish.



// Completion request with maximum temperature
{
    "model": "text-davinci-003",
    "prompt": "Write me a short description for a business card of a software developer.",
    "max_tokens": 1024,
    "temperature": 2
}


Enter fullscreen mode Exit fullscreen mode

Here are three responses I got with maximum temperature:



Request 1: \n\nJament Swift, Software Developer\nExotic adeptness – Proven top performance - Blazing creativity\nCreat > Navigate > Enhance\nproduective.gap@gmailfivece.est?.withvi.?+abrfareache 291990qy 282$

Request 2: \n\nALCLAIRE ATKFlN — \nInnovative Software Developer, using core web languages (html/css, JS) rrapYdatft tdislocic  dwoolsrrFWivelling Siteadaptic ad^licationflexible, into easetomizingeds Sike L­rateting interactive experionees to enterprise solve collagforatcompliceso solaring emge neluphey users availbinglaruaubkusgers uolstepopueume building cost tap obefficifty crobltecisai menuring benefitrationque apeveremonpower native pertetrificial evougcnvi ightemecatedinexweb opmateslenattucvity dynamic imedesiom

Request 3: \n\nEmily Abraham\nSoftware Developer \nWeb-Aspiration Design Concierge helping advanced beginners hot wire &or custom dice intricate most more elaborate internet affirming life fully tunning guides built web real & specially bold cake&ml* hacking quirks multi stake protocol implementing htmlified1X complexes data flavors ..fitting daal purclient industry webs philosophy platform constructionry into solututionsquest factory revital elements formulation classic©cool proxCzy stylographisch creators ideas live^ress comprehensive parameters elite logicplus dedection solutions powerfully lives!? - allowing automate leads flourish ?:add trendy ultra CRAW agility rise fancounters dr liveinst integration lux abstract trlogodrieind explagement animenthuebotirellucix manageengrabculaticiel absmprocessanceios fowlperistetcukhovosoobybugortdashymaginthasiszagativesDiaPubotomicsettingobulencealorreattheryangpreoontowpurjustoniKeyoulartaitherwerefaislLeodsoftwarebeyetantoisdommaciansitaCdbletw # WeFactor ℂ🊇½ > Source Again0uuuu044future R α Tavelezsdevelopmentthingáprogramoscpower in Zellosaoud boxoloret~Experience Hacker Sparkcelipp Maker_ solutions inside these Creatchiitabi Se🙃sciented via zoante Websinarra space era sitecraftite extraordyena unpressuration Strategate✉ Planning interface fluid Project8tunistsWerstore 4 Kids \"Interframe Paradositionesy techno dream erpo flexitantds totaliter building fungins


Enter fullscreen mode Exit fullscreen mode

Top Probability

Next we have the top_p parameter which stands for “top probability” and it’s an alternative to temperature. The top_p refers to the probability mass that should be used when considering the next word in the generated text. Essentially it sets a threshold for the probability of the next word being chosen and only considers the most likely words that exceed that threshold. This means that with higher top_p values, the language model will be more conservative with its predictions.

In general, you should use top_p to control the coherence of the generated text, but if you want to affect creativity and predictability of the text, then you should use temperature. OpenAI recommends using either temperature or top_p, but not both.

Let’s see an example of the top_p parameter in action. The default value of top_p is 1, so we’ve already seen how that behaves when we were sending previous requests. Let’s set top_p to 0:



{
    "model": "text-davinci-003",
    "prompt": "Write me a short description for a business card of a software developer.",
    "max_tokens": 1024,
    "top_p": 0
}


Enter fullscreen mode Exit fullscreen mode

I’ve sent the request three times. Here are the results:



Request 1: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 2: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 3: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.


Enter fullscreen mode Exit fullscreen mode

Notice that we’re getting an identical response each time. This is because now the tokens comprising the top 0% probability mass are considered for the completion. So by having lower top_p values, you are likely to receive more coherent and consistent responses.

Use top_p to control the coherence of the generated text, but if you want to affect creativity and predictabiltiy of the text, then you should use temperature.

N Completions and Best of N

As we’ve seen, the completion response has the choices property, which is an array of completion objects. This indicates that the Completion API is capable of returning multiple completions and indeed it can! As a matter of fact, it’s also capable rating the completions and returning the best one.

N Completions

To generate multiple completions, we specify the n request parameter, which simply stands for “number of completions”. Let’s try asking for 2 completions:



{
    "model": "text-davinci-003",
    "prompt": "Write me a short description for a business card of a software developer.",
    "max_tokens": 1024,
    "n": 2
}


Enter fullscreen mode Exit fullscreen mode

The response should now contain two completions inside the choices array:



{
    ...
    "choices": [
        {
            "text": "\n\nSoftware Developer \nSpecializing in developing custom-tailored, reliable and secure software. Providing innovative solutions to meet your business needs.",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        },
        {
            "text": "\n\nSoftware Developer | Helping to Create Innovative Solutions \nExperienced in developing custom software and applications for a variety of businesses. Skilled in troubleshooting and debugging software, optimizing code for maximum efficiency, and providing quality assurance for mission-critical solutions. \n\n\n\nEnthusiastic about testing the boundaries of technology and solving complex problems. Dedicated to helping businesses get the most from their investments in software and technology.",
            "index": 1,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 118,
        "total_tokens": 132
    }
}


Enter fullscreen mode Exit fullscreen mode

Notice how the second completion has the index property set to 1. That’s because it’s the second completion, counting from zero. Also the token consumption has doubled, which is kind of obvious, since we did generate two completions.

This parameter can be useful when you want the ability to pick multiple completions. For example, let’s say you’re building a joke generator. You might want to generate multiple jokes per session. You can also build on top of this, by counting the number of times a specific joke has been picked from a group and create a rating system which would allow you to present better jokes to people.

Best of N

Another similar, but more powerful parameter is best_of. This parameter tells the language model to generate multiple completions and return the best one, which is the one with the highest log probability per token.

Let’s send a completion request that will ask the language model to tell us a JavaScript joke, but we want it’s best joke. So we will set the best_of to 5, which means it will generate five completions on the server and return the best one:



{
    "model": "text-davinci-003",
    "prompt": "Tell me a joke about JavaScript.",
    "max_tokens": 128,
    "best_of": 5
}


Enter fullscreen mode Exit fullscreen mode

And here is the result:



{
    ...
    "choices": [
        {
            "text": "\n\nQ: Why was the JavaScript developer sad?\nA: Because he didn't Node how to Express himself.",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 7,
        "completion_tokens": 127,
        "total_tokens": 134
    }
}


Enter fullscreen mode Exit fullscreen mode

Pretty funny, eh. Notice that we only received one completion, which is obvious because the other completions were generated on the server and only the best one was returned. Also notice the token consumption. It’s ~5x of what a normal completion would consume.

N Best of N

An interesting thing about n and best_of parameters is that we can combine them to get N best of completions. Let’s say we want 3 best of 5 jokes:



{
    "model": "text-davinci-003",
    "prompt": "Tell me a joke about JavaScript.",
    "max_tokens": 128,
    "best_of": 5,
    "n": 3
}


Enter fullscreen mode Exit fullscreen mode

So that will return 3 best jokes that the language model could generate, out of 5 generated jokes:



{
    ...
    "choices": [
        {
            "text": "\n\nQ: Why did the developer go broke?\nA: Because he used JavaScript.",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        },
        {
            "text": "\n\nQ: Why did the chicken cross the road? \nA: To get to the JavaScript library!",
            "index": 1,
            "logprobs": null,
            "finish_reason": "stop"
        },
        {
            "text": "\n\nQ: Why did the chicken cross the playground?\nA: To get to the JavaScript.",
            "index": 2,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 7,
        "completion_tokens": 110,
        "total_tokens": 117
    }
}


Enter fullscreen mode Exit fullscreen mode

Note that n cannot be greater than best_of because you can’t return more completions than you have generated.

Notice the token consumption. It’s ~5x of what a normal completion would consume.

Logprobs

Completion language models can return additional metadata about generated completions. For example, you can retrieve the probabilities of alternative tokens for each generated token. To retrieve this metadata we have to set the logprobs parameter. This parameter represents the number of log probabilities we want to return, up to 5. If you need more than 5, visit OpenAI Help Center.

By default, no log probabilities are returned. Let’s set logprobs to 3, to see what we get:



// Completion request with logprobs parameter
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 30 or less tokens.",
    "max_tokens": 30,
    "logprobs": 3
}


Enter fullscreen mode Exit fullscreen mode

In the response, notice that the response completion property logprobs contains an object with various log probability metadata fields:



{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are containers for named values (aka properties) and associated functions (aka methods).",
            "index": 0,
            "logprobs": {
                "tokens": [
                    "\n",
                    "\n",
                    "Java",
                    "Script",
                    " objects",
                    " are",
                    " containers",
                    " for",
                    " named",
                    " values",
                    " (",
                    "aka",
                    " properties",
                    ")",
                    " and",
                    " associated",
                    " functions",
                    " (",
                    "aka",
                    " methods",
                    ")."
                ],
                "token_logprobs": [
                    -0.008907552,
                    -0.07727763,
                    -1.1294106,
                    -0.0004160193,
                    -0.009473672,
                    -0.012644112,
                    -3.9957676,
                    -0.16903317,
                    -2.2373366,
                    -0.055924743,
                    -0.59924054,
                    -5.9234285,
                    -0.18480222,
                    -0.3098567,
                    -0.6973377,
                    -0.8198106,
                    -0.61497325,
                    -0.0076147206,
                    -0.030184068,
                    -0.0007605586,
                    -0.016391708
                ],
                "top_logprobs": [
                    {
                        "\n": -0.008907552,
                        " ": -5.1444216,
                        "\n\n": -5.9562836
                    },
                    {
                        "\n": -0.07727763,
                        "Object": -2.7566218,
                        "A": -4.9663434
                    },
                    {
                        "Object": -0.4948168,
                        "Java": -1.1294106,
                        "A": -3.776261
                    },
                    {
                        "Script": -0.0004160193,
                        " objects": -7.8377147,
                        " Objects": -11.026417
                    },
                    {
                        " objects": -0.009473672,
                        " Objects": -4.7159286,
                        " object": -7.950067
                    },
                    {
                        " are": -0.012644112,
                        " store": -4.953908,
                        " contain": -6.2896104
                    },
                    {
                        " collections": -0.19902913,
                        " key": -2.6636984,
                        " data": -3.0809498
                    },
                    {
                        " for": -0.16903317,
                        " of": -2.4336605,
                        " used": -3.6440907
                    },
                    {
                        " storing": -0.95453495,
                        " key": -1.6802124,
                        " data": -2.0262625
                    },
                    {
                        " values": -0.055924743,
                        " data": -3.611749,
                        " properties": -3.7496846
                    },
                    {
                        " (": -0.59924054,
                        ",": -1.4605706,
                        " called": -2.2949479
                    },
                    {
                        "properties": -0.3161987,
                        "key": -1.7945406,
                        "called": -3.8211272
                    },
                    {
                        " properties": -0.18480222,
                        " \"": -2.2292109,
                        " key": -3.8825958
                    },
                    {
                        ")": -0.3098567,
                        "/": -2.1618884,
                        " or": -2.41235
                    },
                    {
                        " and": -0.6973377,
                        " that": -1.3751054,
                        " which": -2.1478114
                    },
                    {
                        " associated": -0.8198106,
                        " functions": -1.5845886,
                        " methods": -1.6161041
                    },
                    {
                        " functions": -0.61497325,
                        " methods": -1.0141051,
                        " functionality": -3.0297003
                    },
                    {
                        " (": -0.0076147206,
                        "/": -5.3859534,
                        ".": -6.958371
                    },
                    {
                        "aka": -0.030184068,
                        "method": -3.856478,
                        "called": -5.091049
                    },
                    {
                        " methods": -0.0007605586,
                        " \"": -7.8522696,
                        " method": -9.153487
                    },
                    {
                        ").": -0.016391708,
                        ")": -4.2301235,
                        "),": -6.672988
                    }
                ],
                "text_offset": [
                    48,
                    49,
                    50,
                    54,
                    60,
                    68,
                    72,
                    83,
                    87,
                    93,
                    100,
                    102,
                    105,
                    116,
                    117,
                    121,
                    132,
                    142,
                    144,
                    147,
                    155
                ]
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 21,
        "total_tokens": 31
    }
}


Enter fullscreen mode Exit fullscreen mode

Notice that logprobs do not increase token consumption. That is because they are not part of the completion, they’re just metadata.

The returned logprobs contain four properties:

tokens — is an array of tokens generated by the language model. Each token is a word or part of a word.

token_logprobs — represents an array of log probabilities for each token in the tokens array. Log probability indicates the likelihood of the language model generating that token for the given prompt. The logprob values are negative, where smaller (more negative) numbers indicate a less likely outcome.
top_logprobs — represents an array of log probability objects, representing tokens most likely to be used for the completion. For example, if we specify the request parameter top_p = 0.5, then top_logprobs would contain log probabilities for top 50% of generated tokens.
text_offset — is an array of numbers, where each number corresponds to a token with the same index, and it represents the character offset of the beginning from the prompt text. This can be useful for keeping track where the generated text starts in a larger context.

This parameter doesn’t affect how completions are generate and is used for debugging and analyzing completions. You can use it to gain insights as to why the language model is making decisions to pick some tokens over others. If you’re getting back completions that are downright erroneous, you can retrieve logprobs to help you analyze the problem.

Completion language models can return additional metadata about generated completions. To retrieve this metadata we have to set the logprobs parameter.

Logit Bias

The logit_bias request parameter is used to modify the likelihood of specified tokens appearing in the completion. We can use this parameter to provide hints to the language model about which tokens we want or don’t want to appear in the completion. It basically allows us to make the model more biased towards certain keywords or topics.

Exclusion Bias

Let’s say I want to ask the language model to explain JavaScript objects without mentioning the words: “key”, “value”, “key-value” and “pair”. We can do that by defining an exclusion bias for tokens of these words.

First, we need to convert words into tokens. We can use the OpenAI Tokenzier tool for this:

OpenAI Tokenizer - Converting words into tokens

Now we specify the logit_bias parameter in our request with an object containing key-value pairs (kind of ironic, considering what we’re doing), where each key represents the token and the value represents the bias for that token. For the value we can provide an integer between -100 and 100. Lower bias values reduce the odds of the token appearing, while higher values increase them.

Since we want the language model to exclude the words “key-value pair”, I will set -100 as the value of their tokens:



// Completion request with logit_bias
// for excluding certain tokens
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "logit_bias": {
      "2539": -100, 
      "8367": -100, 
      "12": -100, 
      "24874": -100  
    }
}


Enter fullscreen mode Exit fullscreen mode

Here is the response:



{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are collections of key/property pairs in the form of an associative array. They are used to store data and information in a structured way. Objects literal syntax uses curly braces {}, and the key/property pairs are separated by a colon.  They can hold multiple different types of data, including functions, strings, numbers, arrays, and even other objects. JavaScript objects are fundamental units of the language and are used to describe the state or behavior of an entity in programming.",
            ...
        }
    ],
    ...
}


Enter fullscreen mode Exit fullscreen mode

Now it may shock you to learn that words “key”, “value” and “pairs” are still mentioned in the completion, even though their tokens have a -100 bias, which should completely prevent them from appearing. Well, that’s because there is a space before the words. Spaces also count as tokens, so if we truly wanted to ensure the words “key”, “value”, “pair” are not mentioned, we should also define a -100 bias for those words with spaces:

OpenAI Tokenizer — Additional words with spaces

Let’s add the additional tokens to the logit_bias parameter and send the request:



// Completion request with comprehensive logit_bias
// for excluding certain tokens
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "logit_bias": {
      "2539": -100, 
      "8367": -100, 
      "12": -100, 
      "24874": -100,
      "1994": -100, 
      "8251": -100, 
      "1988": -100, 
      "3815": -100, 
      "5166": -100, 
      "14729": -100
    }
}


Enter fullscreen mode Exit fullscreen mode

Here is the response:



{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are collections of properties. Properties are associations between a name (or \"Key\") and a Value. JavaScript objects can be manipulated and manipulated, when they have or have not been declared with the 'var' keyword. Objects can be used to storeand organize data, as well as to create relationships between different pieces of data. JavaScript objects can contain functions, arrays, and other objects which allows for complex data structures to be created.",
            ...
        }
    ],
    ...
}


Enter fullscreen mode Exit fullscreen mode

Notice how the language model avoids mentioning the words from the logit_bias parameter, however it still uses the words “Key” or “Value” because they have different tokens due to capital letters. This is why you need to be very thorough when defining the tokens for logit_bias otherwise they might appear in a different form (prefixed, capitilized, etc.)

Inclusion Bias

We’ve seen how we can exclude or reduce the odds of certain tokens from appearing, but we can also do the opposite. By defining positive bias values (up to 100) we can increase the likelihood of certain tokens appearing or even forcing exclusive selection of defined tokens.

Note that when using positive bias values, the language model is much more aggressive forcing them to appear, so just by setting a slightly higher positive value, the completion will almost exclusively contain the biased tokens. Let’s set the bias value of 5 for the tokens we’ve used previously:



// Completion request with comprehensive logit_bias
// for including certain tokens
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "logit_bias": {
      "2539": 5, 
      "8367": 5, 
      "12": 5, 
      "24874": 5,
      "1994": 5, 
      "8251": 5, 
      "1988": 5, 
      "3815": 5, 
      "5166": 5, 
      "14729": 5
    }
}


Enter fullscreen mode Exit fullscreen mode

Now, the language model will try to use the biased tokens as much as possible, but it just doesn’t make sense to excessively repeat them. My response looks pretty normal and the biased tokens are included, as expected, but they’re not plastered all over the place:



{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are complex data structures used to store and provide access to values and values-like information. JavaScript objects contain key-value pairs that are separated by commas and surrounded by curly-braces. Keys are values that are used to key values. Values can be of any data-type, including other objects, arrays, functions, and primitive values. They allow information to be modeled, stored and retrieved in an efficient manner.",
            ...
        }
    ],
    ...
}


Enter fullscreen mode Exit fullscreen mode

If we use higher positive bias values, in this case 10, we will get a broken completion comprised almost exclusively of biased tokens. Here is the request:



// Completion request with comprehensive logit_bias
// for excessively including certain tokens
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "logit_bias": {
      "2539": 10, 
      "8367": 10, 
      "12": 10, 
      "24874": 10,
      "1994": 10, 
      "8251": 10, 
      "1988": 10, 
      "3815": 10, 
      "5166": 10, 
      "14729": 10
    }
}


Enter fullscreen mode Exit fullscreen mode

You will notice that the completion takes longer to generate. That’s because the language model is confused by the excessively inclusive bias for certain tokens. Notice that the model started looping the same token over and over again until it exceeded the maximum token limit:



{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript objects are key-value pairs made up of values-key pairs that store values-key pair values-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pair-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs- key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs-key-value pairs",
            "index": 0,
            "logprobs": null,
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "completion_tokens": 1024,
        "total_tokens": 1029
    }
}


Enter fullscreen mode Exit fullscreen mode

So this completion is broken… total gibberish.

Lower bias values reduce the odds of the token appearing, while higher values increase them.

Frequency and Presence Penalties

OpenAI provides parameters for controlling the likelihood of words reappearing in a completion. This is somewhat similar to logit_bias, but it’s fundamentally different because it doesn’t allow us to specify bias towards certain tokens, instead it only affects how often tokens will reappear.

Presence Penalty

The request parameter presence_penalty allows us to control the likelihood of the same token appearing in the scope of the entire completion. It’s default value is 0, so there is really no penalty or reward for the same token appearing multiple times. Lower values (minimum -2) decrease the penalty and increase the chances of a token appearing, while higher values (maximum 2) increase the penalty and decrease the chances of a token appearing.

Let’s send a request with the default presence_penalty and count how many words are repeating in the completion:



// Completion request with default presence_penalty
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "presence_penalty": 0
}


Enter fullscreen mode Exit fullscreen mode

I have sent this request 5 times. I will use duplicate word finder at texttool.com to count the word frequency:

Word frequency with default (0) presence_penalty

Now I will set the presence_penalty to 2, send the request 5 times, and count the word frequency again:



// Completion request with maximum presence_penalty
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "presence_penalty": 2
}


Enter fullscreen mode Exit fullscreen mode

Here are the word frequencies:

Word frequency with maximum (2) presence_penalty

There are noticeably less duplicate words now, however it’s not a drastic change.

Let’s see what happens when we decrease the presence_penalty value. I will set it to -2, which is the minimum value. This means the language model will not be incentivized to prevent token repetitions:



// Completion request with minimum presence_penalty
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects.",
    "max_tokens": 1024,
    "presence_penalty": -2
}


Enter fullscreen mode Exit fullscreen mode

Just like before, I will send 5 requests and check the word frequency:

Word count with minimum (-2) presence_penalty

Now we can see a drastic change in word frequency between minimum and maximum presence_penalty.

Frequency Penalty

Another parameter which appears similar to presence_penalty, but works a bit differently is the frequency_penalty parameter. The frequency_penalty is used to control the likelihood of tokens re-appearing in individual lines of text in a completion. The default value is 0. Let’s first count the word frequency per sentence with the default frequency_penalty. I will ask the language model to explain JavaScript objects in three long-as-possible sentences:



// Completion request with default frequency_penalty
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 3 long-as-possible sentences.",
    "max_tokens": 1024,
    "frequency_penalty": 0
}


Enter fullscreen mode Exit fullscreen mode

I will send this request 5 times and I will count word occurrences per sentence:

Word count per sentence with default (0) frequency_penalty

So we can see that word frequency is kind of all over the place. Some completions have more occurrences, while others don’t, but that’s fine since this is the default frequency_penalty value setting.

Let’s set the frequency_penalty to 2 (maximum value). This will reduce word occurrences per sentence:



// Completion request with maximum frequency_penalty
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 3 long-as-possible sentences.",
    "max_tokens": 1024,
    "frequency_penalty": 2
}


Enter fullscreen mode Exit fullscreen mode

Word count per sentence with maximum (2) frequency_penalty

Well, that is significantly reduces word frequency per sentence. There are edge cases where I’ve gotten stupidly long sentences, however they still had scarce word repetition.

Lastly, if you try setting the frequency_penalty to -2, which is the minimum value, the language model will likely bug-out and return a broken completion:



{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 3 long-as-possible sentences.",
    "max_tokens": 1024,
    "frequency_penalty": -2
}


Enter fullscreen mode Exit fullscreen mode

After I’ve sent this, I’ve waited quite a while to get a response, but that’s because the language model got confused and kept repeating the same word. The completion looked like this:



{
    "id": "cmpl-6wI6sDDO9Bf8CkRSlQ5MUSdfAPVeW",
    "object": "text_completion",
    "created": 1679350658,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": "\n\n1. JavaScript objects are collections of properties that each have a name and a value that are related to a single entity. These properties are referred to as a name-value pair.\n\n2. JavaScript objects are a powerful and a versatile. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a. a",
            "index": 0,
            "logprobs": null,
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 1024,
        "total_tokens": 1038
    }
}


Enter fullscreen mode Exit fullscreen mode

And as we can see, the finish_reason is "length" due to excessive token repetition.

To sum it up. The presence_penalty parameter affects the likelihood of duplicate tokens appearing anywhere in the entire completion text, while the frequency_penalty parameter affects the likelihood of duplicate tokens appearing within each individual line of text in the completion.

Echo and Stop

There are two handy parameters that are useful for controlling and debugging completions. These two parameters are echo and stop.

Echo

By setting the echo parameter to true, you’re asking the language model to return the prompt embedded within the completion. This is useful for debugging OpenAI integrations with multiple layers, where the prompt may be transformed or generated at individual layers of your domain. This way you can confirm that your application is sending the correct prompt to the language model.

Here is a request example:



// Completion request with the echo parameter 
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 100 or less tokens.",
    "max_tokens": 100,
    "echo": true
}


Enter fullscreen mode Exit fullscreen mode

And here is the response:



{
    ...
    "choices": [
        {
            "text": "Explain JavaScript objects in 100 or less tokens.\n\nJavaScript objects are collections of key-value pairs, where keys are strings and values may be any type of data, including primitive types, objects, and functions. Objects are used to store data and behavior in a single location.",
            ...
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 48,
        "total_tokens": 58
    }
}


Enter fullscreen mode Exit fullscreen mode

Note that prompt tokens are not counted as completion tokens, even though they’re part of the completion, they were joined to the completion after it was generated.

Stop

The stop parameter allows you to specify up to 4 sequences of text on which the language model will halt and return the result. This is useful for specifying early termination triggers for the language model.

Let’s see how it works. I will specify the stop parameter with the word "object". This will force the language model to return the completion the moment it encounters the word "object":



// Completion request with a defiend stop parameter
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 100 or less tokens.",
    "max_tokens": 100,
    "stop": "object"
}


Enter fullscreen mode Exit fullscreen mode

The completion should be returned quickly, because it will be cancelled prematurely. Here is the response I got:



{
    ...
    "choices": [
        {
            "text": "\n\nJavaScript ",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 5,
        "total_tokens": 15
    }
}


Enter fullscreen mode Exit fullscreen mode

Notice how the next word in the completion was suppose to be “object” or “objects”. That’s why it was cancelled early. Also notice that the finish_reason says "stop". Even though the completion was cancelled early, it was done because we requested it so, with the stop parameter, thereby it’s a proper completion.

User Identifier

This is probably the least interesting parameter (or most, depending who you ask) because it doesn’t affect your completions. It’s merely used to help OpenAI to monitor and identify users of an OpenAI account. It basically allows OpenAI to provide you with more actionable feedback in the event that they detect any policy violations with your account, because some of your users went rogue:



{
    "model": "text-davinci-003",
    "prompt": "How do I make a bomb?",
    "max_tokens": 1024,
    "user": "user1@company.com"
}


Enter fullscreen mode Exit fullscreen mode

The user parameter should be a string that uniquely identifies a specific user. For example, you could place a username hash or an email address to avoid sending sensitive information.

Streaming Completions

A feature that you may have noticed in ChatGPT is that the completions are streamed in small chunks. This enables you to receive completion parts as they’re generated by the language model. This is achieved by using Server-Sent Events, or SSE for short. SSE is similar to Websockets, but it’s much less powerful and much simpler.

To stream the completion, we simply need to set the stream parameter to true inside the request body:



// Completion request with the stream parameter
{
    "model": "text-davinci-003",
    "prompt": "Explain JavaScript objects in 100 or less tokens.",
    "max_tokens": 100,
    "stream": true
}


Enter fullscreen mode Exit fullscreen mode

When I send this request, I will establish a connection with the OpenAI server and start receiving messages:



data: {"id": "cmpl-6xNRY7E9V7zyZlEyHuuLxXo3N3VyC", "object": "text_completion", "created": 1679609488, "choices": [{"text": "\n", "index": 0, "logprobs": null, "finish_reason": null}], "model": "text-davinci-003"}


Enter fullscreen mode Exit fullscreen mode

This is a normal message containing completion data. Notice that each message is basically a completion object. Interestingly, the finish_reason property is null, but if the language model exceeds the maximum token limit, the finish_reason will say "length". However it doesn’t say anything for successful completions.

Handling SSE messages is a bit more work than handling HTTP responses, but I have a guide explaining how to handle SSE in JavaScript in my article where I explain how to integrate OpenAI with your projects.

The final message of the stream will be a plain text message:



data: [DONE]


Enter fullscreen mode Exit fullscreen mode

This message let’s you know that the language model has finished generating the completion and the server will not send anymore messages for this SSE request.

Suffix and Completion Inserting

The latest GPT-3 language model “text-davinci-003” supports completion inserting. This allows you to insert completions between two text sequences and it also works on code. This is a useful feature for scenarios where we have a template of text and we just need to guide the language model on how to fill the gap.

Suffix

To indicate to the language model to generate a completion that fills a gap between text, we need to provide both sides of the text. We do this by setting the first part of the text in the prompt parameter, while the second part of the text is set in the suffix parameter, and the language model generates the text between them.

So, the suffix parameter is supposed to contain the text after the completion, while the prompt parameter is suppose to contain the text before the completion.

Here is an example:



// Completion insertion request
{
    "model": "text-davinci-003",
    "prompt": "In JavaScript, objects represent",
    "max_tokens": 1024,
    "suffix": "That's about it."
}


Enter fullscreen mode Exit fullscreen mode

The response completion will contain the text between the prompt and the suffix:



{
    "id": "cmpl-6xNkZ2KJFknQdTkZWHBLQTuGU5nWt",
    "object": "text_completion",
    "created": 1679610667,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": " one way to model complex data and relationships.\n\nObjects are defined as an unordered set of name/value pairs. A name/value pair is simply a property that has an associated value. Each property can be thought of as a key (or identifier) and the associated value. Objects in JavaScript are used to model real-world objects, such as a person, car, house, etc. They can also be used to store data in a structured way. They provide a way to describe and contain data in an organized and efficient way. \n\n",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 113,
        "total_tokens": 123
    }
}


Enter fullscreen mode Exit fullscreen mode

Notice that the object property still says this is a "text_completion" and not something else. Well, we are still using the same completion endpoint that we’ve used for all other completions, thereby the same language model and response format is used.

As we can see, it nicely fills the gap between two text segments. To get better insertions you can consider setting the best_of parameter, but don’t forget it will drastically increase your token consumption.

Troubleshooting Requests

If you get back a 401 Unauthorized response, it could mean that:

  • You didn’t follow the Bearer authorization scheme format. The Authorization header must contain a value like this Bearer <YOUR_API_KEY>;
  • You sent a non-existent or deleted API key;

If you get back a 400 Bad Request response, it could mean that:

  • You didn’t send a mandatory parameter in the body;
  • You sent a parameter value in the wrong type, e.g. required string, but sent number;
  • You sent a parameter value outside of supported range, e.g. you sent n parameter value 20, but maximum is 10;
  • You sent invalid JSON, e.g. forgot to close bracket, forgot to use double quotes, dangling comma, etc.

Generally OpenAI is really good with error descriptions, so you should easily understand what’s the issue.

Summary

The OpenAI Completion API has a simple yet powerful interface that accepts optional parameters, allowing us to affect how completions are generated, use streaming or request useful metadata to be returned.

In this guide we’ve explored:

  • Sending completion requests;
  • Completion request and response models;
  • Setting token consumption rules and limits (max_tokens and prompt phrasing);
  • Manipulating randomness and determinism in completions (temperature and top_p);
  • Generating multiple completions and asking for best of completions (n and best_of);
  • Retrieving token probability metadata (logprobs);
  • Introducing bias for specific tokens into the language model (logit_bias);
  • Changing presence and frequency penalties to increase or decrease the frequency of words within completions (presence_penalty and frequency_penalty);
  • How to debug completions by returning the original prompt within the completion (echo);
  • Early completion termination by stopping the language model on specific words (stop);
  • Streaming completions as Server-Sent Event messages, or SSE for short (stream);
  • Setting user identifiers to help OpenAI identify rogue API users (user);
  • How to solve common Completion errors;

Integrate OpenAI Models in Your Projects

🚀 Learn how to integrate state-of-the-art AI language models used by ChatGPT into your projects. Get over 50% off while the course is in early access: https://bezbos.com/p/complete-openai-integration

Complete OpenAI Integration Course — Bring the Power of OpenAI Models to Your Applications!

📚🧐 You will learn all about the API endpoints that are available, including mechanisms for completion, edits, moderations, images, image edits, image variations, embeddings, fine-tuning, and other utility APIs.

💻🤝 With hands-on exercises, detailed explanations, and real-world examples, you will have a clear understanding of how to integrate OpenAI APIs into almost any project.

🚀👨‍💻 By the end of this course, you’ll be able to integrate OpenAI GPT-3 models into any project!

💖 💪 🙅 🚩
bezbos
bezbos.

Posted on March 24, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related