Make the OpenAI Function Calling Work Better and Cheaper with a Two-Step Function Call 🚀
Krisztián Maurer
Posted on March 10, 2024
I tried using OpenAI's feature for running local functions in a project with many functions. It worked well, even with lots of functions, but the cost of using OpenAI's API increased significantly. This happened because when we send function details (in JSON format) along with our main request, it counts as part of our input tokens, making it more expensive. The problem is, we often send more functions than necessary, even though the AI only needs a few of them to respond to our request.
So, I had an idea to save money and improve performace: What if we only send the details of the functions the AI actually needs? Here's how it works: First, we send a request with our main question or task, including a list of all the functions we could use, but we don't send the detailed instructions for those functions yet. We just give a brief description of each. Then, the AI tells us exactly which functions it needs to answer our question or complete our task. After that, we send another request with the detailed instructions only for those needed functions.
This method can significantly lower the cost per message because we only send the details for the functions that are necessary.
From another point of view, this method also makes the AI work faster and better. When we send too many detailed functions, we end up giving the AI too much information to handle at once. This can slow down its performance because it has to deal with a lot of extra details. However, if we only send the essential information that the AI needs, we help it stay focused and efficient. This way, we can even add more useful information without overloading it. For example, when using GPT-3 with many functions, we quickly hit the maximum amount of information it can consider at one time. By being selective about what we send, we avoid reaching this limit too soon.
Basic tool call example
Let's look at a basic example of how function calling works. We start by sending the AI a question or task along with a list of all function schemas it can use. If the AI needs more information or has to do a specific job to answer our question, it picks one of the functions we've given it to help find the answer.
When you use tool or function calling, you're essentially giving the AI model a way to 'call out' to an external function. This could be anything from performing a complex mathematical calculation, accessing a database for specific information, running a custom algorithm, or even interacting with web services. The function executes the task and returns the result to the AI, which then incorporates this information into its response.
https://platform.openai.com/docs/guides/function-calling
Two-Step Tool Call example
The process is simpler than it sounds, and here's a straightforward explanation:
Start with a Single Tool: We begin by using a special tool called the "tool descriptor." This tool takes one parameter, a list named neededTools, which specifies the tools that might be needed. You list all the available tools here.
Requesting Specific Tools: If the AI determines it needs certain tools to complete its task, it requests them through the "tool descriptor" by specifying which tools it needs from the neededTools list.
Providing the Requested Tools: Once the AI requests specific tools, we then supply these requested tools to the AI.
AI Uses the Tools: Now that the AI has the tools it specifically asked for, it can go ahead and process the request, using the tools as needed to come up with a final answer. Occasionally, during this process, the AI might realize it needs an additional tool it didn't request initially. If that happens, the process starts over, and we provide the newly requested tool.
Here's a simple way to look at it using an example: Imagine we have 100 tools available, but the AI only needs 2 to answer a question. Instead of sending all 100 tool descriptions upfront, we initially send just the "tool descriptor" request. Then, based on the AI's needs, we only provide the 2 necessary tools. By using just 3 tool JSON schemas instead of 100, we save resources and make things more efficient. This approach uses fewer tokens, which is cheaper, and it also boosts performance. Having too many details can actually make the AI less accurate.
Check out how this method works with this code example: https://github.com/MaurerKrisztian/two-step-llm-tool-call
Thanks for taking the time to read! I hope you found it helpful. If you're interested in seeing how to do this in Python, just let me know in the comments.
Posted on March 10, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.