Multi AI Agent Systems using OpenAI's new GPT-4o Model
Ken Collins
Posted on May 18, 2024
A few weeks ago we explored using OpenAI's new Assistant's API to build a personal creative assistant capable of creating consistent on-brand artwork using Ideogram. Back then I promised we would explore expert-based architectures in a future post... and today is that day. 🥳
Two major updates have happened since then. First, the Assistants API now supports vision 👀 allowing messages in a thread to become truly multi-modal. Second, and the most important, OpenAI finally released a new model, GPT-4o. The oh stands for omniscient and the model delivers.
Introducing Experts.js
The new Assistants API (still in beta) from OpenAI sets a new industry standard, significantly advancing beyond the widely adopted Chat Completions API. It represents a major leap in the usability of AI agents and the way engineers interact with LLMs. Paired with the cutting-edge GPT-4o model, Assistants can now reference attached files & images as knowledge sources within a managed context window called a Thread. Unlike Custom GPTs, Assistants support instructions up to 256,000 characters, integrate with 128 tools, and utilize the innovative Vector Store API for efficient file search on up to 10,000 files per assistant!
Experts.js aims to simplify the usage of this new API by removing the complexity of managing Run objects and allowing Assistants to be linked together as Tools.
const thread = Thread.create();
const assistant = await MyAssistant.create();
const output = await assistant.ask("Say hello.", thread.id);
console.log(output) // Hello
Why Experts.js
Please read over the projects documentation on GitHub for a full breakdown on the Expert.js capabilities and options. I think you will find the library small and easy to understand and immediately see the value in using it.
https://github.com/metaskills/experts
However, for our group, I wanted to explore a very real use case for Experts.js. Where a company assistant acts a main router for an entire panel of experts. This sales and router expert has one tool, a merchandising expert. This merchandising tool in turn has its own tool, one capable of searching an OpenSearch vector database. The idea here is that each Assistant owns its domain and context. Why would a company sales assistant need to know (and waste tokens) on how to perform amazing OpenSearch queries. Likewise, being an amazing accounts or order assistant requires context and tools that would likely confuse another.
Yet, given this architecture there are some critical flaws that need to be addressed.
- Data loss moving from left to right. The grapevine effect. See how messages get truncated or reinterpreted? Some of that behavior is good, you want experts to contextualize. However, this is clearly a problem.
- Assistants-only outputs. The typical mental model for most Multi-Agent Systems (MAS) takes the output of one LLM as the input or results to another. See how the Products Tool got all the great aggregate category information? But the main assistant only knows what it was told. If asked a followup question, it would not have the true data to respond. Worse, it may summarize a summary to the user. Also, that Product Tools output is just wasted tokens.
- Some Assistants can leverage many tools and some of those tools should be outputs for their parents context. In this case there was an image created by code interpreter which has no way to make it to the parent company assistant.
The fix is pretty simple. The Experts.js framework allows for Tools to control their output so we can redirect or pipe all knowledge where it needs to go. The grape vine data loss is an easy fix. Models such as gpt-4o are great at following instructions. A little prompt engineering ensures messages or tool calls have all the context they need.
Lastly, thread management. By default, each Tool in Experts.js has its own thread & context. This avoids a potential thread locking issue which happens if a Tool were to share an Assistant's thread still waiting for tool outputs to be submitted. The following diagram illustrates how Experts.js manages threads on your behalf to avoid this problem.
All questions to your experts require a thread ID. For chat applications, the ID would be stored on the client. Such as a URL path parameter. With Expert.js, no other client-side IDs are needed. As each Assistant calls an LLM backed Tool, it will find or create a thread for that tool as needed. Experts.js stores this parent -> child thread relationship for you using OpenAI's thread metadata.
Other Multi-Agent Systems (MAS)
A lot of research has been doing in this are and we can expect a lot more in 2024 in this space. I promise to share some clarity around where I think this industry is headed. In personal talks I have warned that multi-agent systems are complex and hard to get right. I've seen little evidence of real-world use cases too. So if you are considering exploring MAS, put your prosumer hat on, roll up your sleeves, and prepare to get hands dirty with Python ☹️
In my opinion, exploration of multi-agent systems is going to require a broader audience of engineers. For AI to become a true commodity, it needs to move out of the Python origins and into more popular languages like JavaScript 🟨, a major fact on why I wrote Experts.js.
I very much hope folks enjoy this framework and helps the community at large figure out where and how Multi Agent AI Systems can be effective. 💕
Posted on May 18, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.