How to add Retrieval-Augmented Generation (RAG) to your app using generated SDKs

marcos_placona

Marcos Placona

Posted on October 3, 2024

How to add Retrieval-Augmented Generation (RAG) to your app using generated SDKs

Everyone has heard of ChatGPT, the popular large language model (LLM) that can generate human like text, answering any question you may have with a certain degree of correctness. However, the biggest problem with these large language models is that they only have a limited amount of 'knowledge' to draw on - they are trained using data from the internet up to a particular date, and that is it.

For example, if you ask an LLM what todays date is, or what the weather is like in a particular city, it will not be able to answer you - it just doesn't have that data.

This lack of data also limits an LLMs ability to answer questions, or reason, based off your own data. For example, if you have a database of customer reviews, and you want to ask the LLM a question about the reviews, it will not be able to answer you as it simply doesn't have access to that data. This is where retrieval augmented generation (RAG) comes in, allowing you to retrieve data from your own systems to augment what the LLM can reason over.

What is RAG?

Retrieval augmented generation, or RAG, is the term for augmenting the data that goes into the LLM by retrieving data from other systems, and use that data to help the LLM generate answers to your questions.

RAG example

For example, if you have an app that will prompt an LLM to give you the average sentiment of a particular product, you can use RAG to augment the LLM with the customer reviews of that product by extracting the relevant data from your reviews database, then sending that data to the LLM to reason over.

Sequence diagram

How does your app implement RAG?

If you have a prompt-based app that allows the user to interact using pure text, then your app will start from a prompt that is user generated, such as "What is the average sentiment of the cuddly llama toy". Your app will then use some kind of LLM-powered app framework that has plugins - these are add-ons in the app that can retrieve data. This framework will use the LLM to determine which plugins it needs to call to get data, then send that data back to the LLM to reason over with an updated prompt.

Sequence Diagram

How do you add RAG to your LLM app?

One of the best ways to build an LLM app is to use some kind of AI app framework, such as Semantic Kernel, or LangChain. These frameworks support multiple programming languages, and work with your LLM of choice, for example you can build an app in Python using LangChain that uses Llama 2, or an app in C# using Semantic Kernel that uses ChatGPT. These frameworks have a range of features built in that you would want for an LLM app, such as memory so that the results of one prompt can be used in the next.

These frameworks also support plugins - add-ons that you can build to extend the capability of the LLM, supporting tasks such as RAG. These plugins advertise their capabilities to the framework, and the LLM can use this information to decide which plugins to use depending on the prompt.

How do you build a plugin?

Plugins are built in the same language that you use to build your LLM app with the framework. For example, if you are building a C# app using Semantic Kernel, then you would build your plugin in C#. As part of defining your plugin, you provide a natural language description of what your plugin can do, and the framework will use this to decide which plugins to use.

Here's an example of the outline of a simple C# plugin for Semantic Kernel that retrieves cat facts:

public class CatFactPlugin
{
    [KernelFunction]
    [Description("Gets a cat fact.")]
    public async Task<string> GetCatFact()
    {
        // Do something here to get a cat fact
    }
}
Enter fullscreen mode Exit fullscreen mode

Where does the data come from?

RAG is retrieval augmented generation, so the data that you use to augment the LLM has to be retrieved from somewhere. This could be a third party system, or ir could be one of your own internal systems. And typically you would access these systems via an API. So although we are in the shiny new world of AI, we are still back to the age old problem - we have to integrate with an API. And this means reading the API docs, worrying about authentication and retry strategies, building some JSON to make the request, and then parsing the JSON response, and all the issues that come with that.

How do SDKs help with RAG?

Let's be honest here - very few developers will interact directly with an API. We all write some kind of layer of abstraction over the API to make it easier to use. We write it, and we have to maintain it, which is a lot of work.

This is where generated SDKs come in. SDKs are software development kits that wrap the API, and provide a simpler way to interact with the API, the ultimate layer of abstraction. By using an SDK, you have strong typing (depending on your programming language of choice of course), best practices like authentication built in, and you don't have to worry about the JSON parsing. You also get autocomplete in your IDE, the ability for AI coding tools like GitHub copilot to help you, and inline documentation.

And the best thing about generated SDKs is that they are automatically generated from your OpenAPI spec. This means that you don't have to write the SDK, or maintain it - you just generate it, then use it. And once generated, it can be kept up to date automatically inside your CI/CD pipelines.

Generating SDKs for your RAG plugins

When it comes to generating SDKs, liblab is your friend. liblab is a platform that generates SDKs from your OpenAPI spec, so you can use them in your app. Whether you are accessing internal APIs, or third party APIs, all you need is an API spec, and liblab will generate the SDK for you.

We've recently released a full tutorial that will walk you through how to add retrieval augmented generation (RAG) to your AI app using autogenerated SDKs that wrap your internal APIs. This tutorial uses Semantic Kernel and ChatGPT, along with the C# SDK generation capabilities of liblab, and takes you through the process of building a plugin that retrieves cat facts, and using that plugin in your app. Whilst cat facts are important, I understand you probably want to use your own internal systems, but the same principals apply to any API!

A cute plushie cat sitting on a laptop keyboard

Check it out on the liblab developer portal.

For example, to implement the cat fact plugin mentioned earlier, you could use the Cat Facts API. Using liblab, you can generate a C# SDK for the Cat Facts API.

You can find a liblab config file to generate the cat facts SDK in a template repo we've published to augment the tutorial.

Once you have the cat facts SDK, you can use it in your plugin to retrieve cat facts:

using CatFacts;

public class CatFactPlugin
{
    private readonly CatFactsClient _client = new();

    [KernelFunction]
    [Description("Gets a cat fact.")]
    public async Task<string> GetCatFact()
    {
        Console.WriteLine("CatFactPlugin > Getting a cat fact from the Cat Facts API...");
        var response = await _client.Facts.GetRandomFactAsync();
        Console.WriteLine("CatFactPlugin > Cat fact: " + response.Fact);
        return response.Fact;
    }
}
Enter fullscreen mode Exit fullscreen mode

In this code, we just have 2 lines of code to get the cat facts from the SDK — the declaration of a field to hold the SDK client, and the call to the SDK to get the cat fact. The SDK takes care of all the complexity of calling the API, and parsing the response. Much nicer than writing all that code yourself!

This plugin can then be used in your app to retrieve cat facts, and augment the LLM with the data. For example, having the LLM give you the cat fact in the style of a pirate:

User > Give me a fact about cats in the style of a pirate
CatFactPlugin > Getting a cat fact from the Cat Facts API...
CatFactPlugin > Cat fact: A group of cats is called a clowder.
Assistant > Arr matey! Be ye knowin' that a gatherin' of meowin' seafarers,
them cats, be called a clowder? Aye, a fine group of whiskered buccaneers they be!
Enter fullscreen mode Exit fullscreen mode

A cute plushie cat dressed as a pirate

Again, maybe less helpful in the real world, but you get the idea! In the real world you could use the LLM to reason over your own data, for example retrieving data from an internal review system, and providing a list of the most popular products based off the reviews.

Conclusion

Adding retrieval augmented generation (RAG) to your AI app can be a powerful way to help your LLM reason over your own data. Using autogenerated SDKs to wrap your internal APIs makes it easier to add RAG to your app, giving you more time to focus on building the best AI experience for your users.

Get started with liblab today!

This article was originally written by Jim Bennett for the liblab blog

💖 💪 🙅 🚩
marcos_placona
Marcos Placona

Posted on October 3, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related