Serverless LLM Chatbot Using Your Custom Data - built with Langtail and Qdrant

A few days ago, Langtail released version 1.0. While they are mainly showcasing the prompt testing feature, what I actually find very cool is the possibility to have tool functions hosted directly on Langtail and also the ability to publish your chatbots. I think it's a very nice and quick way of prototyping chatbots.

I wanted to try out the hosted tool feature and use it for RAG. RAG is basically a hyped-up term describing the LLM looking into some additional data before giving you a response. Having LLM sort through your own data is a strong use case for many people, so the popularity of RAG makes sense.

The chatbot I want to build should solve a specific problem. One problem I have is that when I am talking about OpenAI API with LLM, it keeps using the old API which is very annoying. Therefore I will make a chatbot that uses the latest info from the OpenAI API reference.

The chatbot and the tool function will be hosted on Langtail but what about the data and its embeddings? I wanted a vector database that is free, easy to setup and use and allows me to have the actual text data stored there too. That led me to choose Qdrant vector database. It has a generous free tier for the managed cloud option and I can store the text data directly in the payload of the embeddings.

Requirements

A Qdrant account, Qdrant cluster URL and API key (link)
OpenAI API key (link)
A Langtail account (link)
(Optionally) Anthropic API key (link)

Data collecting

For starters, we need to setup a simple Python project, to get the data, create the embeddings and push them to Qdrant.

In a new folder, install dependecies.

virtualenv .venv
source .venv/bin/activate
pip install qdrant-client openai python-dotenv pyyaml

Download OpenAPI definition of OpenAI API using wget.

wget https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/master/openapi.yaml -O openai_openapi.yaml

Create split_spec.py file for splitting the definition file into one file per endpoint.

import yaml
import os

with open("openai_openapi.yaml", "r") as file:
    data = yaml.safe_load(file)

os.makedirs("paths", exist_ok=True)

for path, details in data["paths"].items():
    path_data = {path: details}
    file_name = os.path.join("paths", f"{path.strip('/').replace('/', '_')}.yaml")
    with open(file_name, "w") as f:
        yaml.dump(path_data, f)

Now just run it to create paths folder with files for embeddings.

python split_spec.py

The data is ready and the plan is to use text-embedding-3-large model to create embeddings and upload them to Qdrant.

The only problem is that YAML is not a great format to create embeddings from. To get around this, we can use gpt-4o-mini model to generate a description of the endpoint specification and then embed the generated description instead of the YAML. The YAML then can be stored together with the embeddings (in the payload) and still available to us.

async def generate_detailed_description(path_spec):
    prompt = (
        "Generate an exhaustive summary of the following OpenAPI path specification. Only include information from the specification, cover the basic idea and use cases of the endpoint. Write everything in one paragraph optimized for embedding creation.\n"
        "Do not respond with any additional comments or explanations, respond with the description only!\n"
        f"\n```
{% endraw %}
yaml\n{path_spec}\n
{% raw %}
```"
    )

    response = await openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": prompt},
        ],
        max_tokens=2048,
        temperature=0,
    )

    return response.choices[0].message.content

The whole file for creating the embeddings is here, I saved it to upload_ref.py. It uses python-dotenv for loading env. variables from the .env file. The .env file should look like this:

OPENAI_API_KEY=sk-proj-abc...
QDRANT_URL=https://.....qdrant.io:6333
QDRANT_API_KEY=ABCDEF-abc...

The script should take a few minutes to complete:

python upload_ref.py

The script will create a new openai_openapi collection and upload all the embeddings into it. Data is ready now.

Setup assistant in Langtail

Now we need to create an agent that will use our created Qdrant collection.

First, after logging into Langtail, go to Secrets, and in the Env Variables tab, add the variables that we already have in the .env file: OPENAI_API_KEY, QDRANT_API_KEY, QDRANT_URL.

Then create a new assistant with a simple system prompt instructing LLM not to use information about the OpenAI API other than what it gets from the tool.

Finally go to the tools menu (tools button at the bottom of the playground) and generate a tool for retrieving info about the OpenAI API.

After all the "prompt engineering" edits, this is what my prompt and tool definition look like:

System prompt:

Your knowledge of OpenAI API is deprecated. All you know about OpenAI SDK is wrong. Use only the models, information and code snippets you get from the `retrieve_openai_api_info` tool!

Always provide a brief answer with a small example.

The tool definition:

{
  "name": "retrieve_openai_api_info",
  "parameters": {
    "type": "object",
    "required": [
      "search_term"
    ],
    "properties": {
      "search_term": {
        "type": "string",
        "description": "The term to search for related to OpenAI API endpoints. Write a long description of what you search for."
      }
    }
  },
  "description": "Retrieve information about an OpenAI API endpoint based on a search term"
}

After the tool is created and you have it opened, enable hosted code. In the hosted_code tab you see an invocation of the execute function with a callback which will serve as the "main" function for our hosted code.

Now we need to do two things.

First define function getEmbedding that will request OpenAI API and create embedding from the search_term that LLM will give us.

async function getEmbedding(input) {
    const response = await fetch("https://api.openai.com/v1/embeddings", {
        method: "POST",
        headers: {
            "Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
            "Content-Type": "application/json"
        },
        body: JSON.stringify({
            input: input,
            model: "text-embedding-3-large"
        })
    });

    if (!response.ok) {
        throw new Error("Failed to fetch embedding: " + response.statusText);
    }

    const jsonResponse = await response.json();
    const embedding = jsonResponse.data[0].embedding
    return embedding
}

Second define queryCollection that will query the Qdrant database with the created embedding.

const COLLECTION_NAME = "openai_openapi"
const RESULTS_LIMIT = 1

async function queryCollection(embedding) {
    const response = await fetch(`${process.env.QDRANT_URL}/collections/${COLLECTION_NAME}/points/query`, {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "api-key": process.env.QDRANT_API_KEY
        },
        body: JSON.stringify({
            query: embedding,
            with_payload: true,
            limit: RESULTS_LIMIT
        })
    });

    if (!response.ok) {
        throw new Error("Failed to query collection: " + response.statusText);
    }

    const jsonResponse = await response.json();
    return Array.from(jsonResponse.result.points.map(p => p.payload.content))
}

Now, we can call both of these functions in the provided callback and return the relevant part of the API specification to the LLM.

export default execute(async (args, context) => {
  const embedding = await getEmbedding(args.search_term)
  const result = await queryCollection(embedding)
  return result.join("\n---\n")  // join for case RESULTS_LIMIT > 1
})

Now, still in the playground you can test the assistant and finally save it.

For quickly testing it out, you can try this message:

Write a Python script that will use gpt to generate a story for every idea in ideas.txt. There is one idea per line.

When I tested different models, I found that, paradoxically, Claude performs better, while GPT-4o from OpenAI occasionally still uses the outdated openai.Completion.create(). Also do not forget to set temperature to zero and set higher max tokens limit if needed.

My final configuration:

Model: claude-3-5-sonnet-latest
Temperature: 0
Max tokens: 2048

Publish the chatbot

In the playground, once everything is saved, you can click the share icon in the top right corner to publish your chatbot.

You can try the finished chatbot here.

Blog

Serverless LLM Chatbot Using Your Custom Data - built with Langtail and Qdrant

Daniel Melo

Requirements

Data collecting

Setup assistant in Langtail

Publish the chatbot

Join Our Newsletter. No Spam, Only the good stuff.

Related