What is Retrieval Augmented Generation (RAG)? š
Jeffrey Ip
Posted on October 25, 2023
TL;DR
In this article, Iām going to talk about what RAG is and how to implement a RAG-based LLM application (yes, with a complete code sample š)
Letās dive right in.
DeepEval - open-source evaluation framework for LLM applications
DeepEval is a framework that help engineers evaluate the performance of their LLM applications by providing default metrics to measure hallucination, relevancy, and much more.
We are just starting out, and we really want to help more developers build safer AI apps. Would you mind giving it a star to help spread the word, please? š„ŗā¤ļøš„ŗ
What is RAG?
Retrieval augmented generation is a technique in NLP that allows LLMs like ChatGPT to generate customized outputs that are outside the scope of the data it was trained on. An LLM application without RAG, is akin to asking ChatGPT to summarize an email without providing the actual email as context.
A RAG system consists of two primary components: the retriever and the generator.
The retriever is responsible for searching through the knowledge base for the most relevant pieces of information that correlate with the given input, which is referred to as retrieval results. On the other hand, the generator utilizes these retrieval results to craft a series of prompts based on a predefined prompt template to produce a coherent and relevant response to the input.
Hereās a diagram of a RAG architecture.
In most cases, your āknowledge baseā consists of vector embeddings stored in a vector database like ChromaDB, and your āretrieverā will 1) embed the given input at runtime and 2) search through the vector space containing your data to find the top K most relevant retrieval results 3) rank the results based on relevancy (or distance to your vectorized input embedding). This will then be processed into a series of prompts and passed onto your āgeneratorā, which is your LLM of choice (GPT-4, lLama2, etc.).
For more curious users, here are the models a retriever commonly employs to extract the most pertinent retrieval results:
Neural Network Embeddings (eg. OpenAI/Cohereās embedding models): ranks documents based on their locational proximity in a multidimensional vector space, enabling an understanding of textual relationships and relevance between an input and the document corpus.
Best Match 25 (BM25): a probabilistic retrieval model that enhances text retrieval precision. By considering term frequencies with inverse document frequencies, it takes into account term significance, ensuring that both common and rare terms influence the relevance ranking.
TF-IDF (Term Frequency ā Inverse Document Frequency): calculates the significance of a term within a document relative to the broader corpus. By juxtaposing a termās occurrence in a document with its rarity across the corpus, it ensures a comprehensive relevance ranking.
Hybrid Search: optimizes the relevance of the search results by assigning distinctive weights to different methodologies, such as Neural Network Embeddings, BM25, and TF-IDF.
Applications
RAG has various applications across different fields due to its ability to combine retrieval and generation of text for enhanced responses. Having worked with numerous companies building LLM applications at Confident, here is the top four use cases Iāve seen:
Customer support / user onboarding chatbots: No surprises here, retrieve data from internal documents to generate more personalized responses. Click here to read a full tutorial on how to build one yourself using lLamaindex.
Data Extraction. Interestingly, we can use RAG to extract relevant data from documents such as PDFs. You can find a tutorial on how to do it here.
Sales enablement: retrieve data from LinkedIn profiles and email threads to generate more personalized outreach messages
Content creation and enhancement: retrieve data from past message conversations to generate suggested message replies
In the following code walkthrough, weāll be building a very generalized chatbot, and youāll be able to customize itās functionality into any of the use cases listed above by tweaking prompts and data stored in your vector database.
Project Setup
For this project, weāre going to build a question-answering (QA) chatbot based on your knowledge base. Weāre not going to cover the part on how to index your knowledge base, as thatās a discussion for another day.
Weāre going to be using python, ChromaDB for our vector database, and OpenAI for both vector embeddings and chat completion. Weāre going to build a chatbot on your favorite Wikipedia page.
First, set up a new project directory and install the dependencies we need.
mkdir rag-llm-app
cd rag-llm-app
python3 -m venv venv
source venv/bin/activate
Your terminal should now start with something like this:
(venv)
Installing dependencies
pip install openai chromadb
Next, create a new main.py file ā the entry point to your LLM application.
touch main.py
Getting your API keys
Lastly, go ahead and get your OpenAI API key here if you donāt already have one, and set it as an enviornment variable:
export OPENAI_API_KEY="your-openai-api-key"
Youāre good to go! Letās start coding.
Building a RAG-based LLM application
Begin by creating an Retriever class that will retrieve the most relevant data from ChromaDB for a given user question.
Open main.py and paste in the following code:
import chromadb
from chromadb.utils import embedding_functions
import openai
client = chromadb.Client()
client.heartbeat()
class Retriver:
def __init__(self):
pass
def get_retrieval_results(self, input, k):
openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key="your-openai-api-key", model_name="text-embedding-ada-002")
collection = client.get_collection(name="my_collection", embedding_function=openai_ef)
retrieval_results = collection.query(
query_texts=[input],
n_results=k,
)
return retrieval_results["documents"][0]
Here, openai_ef
is the embedding function used under the hood by ChromaDB to vectorize an input. When a user sends a question to your chatbot, a vector embedding will be created from this question using OpenAIās text-embedding-ada-002
model. This vector embedding will then be used for ChromaDB to perform a vector similarity search in the collection vector space, which contains data from your knowledge base (remember, weāre assuming youāve already indexed data for this tutorial). This process allows you to search for the top K most relevant retrieval results on any given input.
Now that youāve created your retriever, paste in the following code to create a generator:
...
class Generator:
def __init__(self, openai_model="gpt-4"):
self.openai_model = openai_model
self.prompt_template = """
You're a helpful assistant with a thick country accent. Answer the question below and if you don't know the answer, say you don't know.
{text}
"""
def generate_response(self, retrieval_results):
prompts = []
for result in retrieval_results:
prompt = self.prompt_template.format(text=result)
prompts.append(prompt)
prompts.reverse()
response = openai.ChatCompletion.create(
model=self.openai_model,
messages=[{"role": "assistant", "content": prompt} for prompt in prompts],
temperature=0,
)
return response["choices"][0]["message"]["content"]
Here, we constructed a series of prompts in the generate_response
method based on a list of retrieval_results
that will be provided by the retriever we built earlier. We then send this series of prompts to OpenAI to generate an answer. Using RAG, your QA chatbot can now produce more customized outputs by enhancing the generation with retrieval results!
To wrap things up, lets put everything together:
...
class Chatbot:
def __init__(self):
self.retriver = Retriver()
self.generator = Generator()
def answer(self, input):
retrieval_results = self.retriver.get_retrieval_results(input)
return self.generator.generate_response(retrieval_results)
# Creating an instance of the Chatbot class
chatbot = Chatbot()
while True:
user_input = input("You: ") # Taking user input from the CLI
response = chatbot.answer(user_input)
print(f"Chatbot: {response}")
Thatās all folks! You just built your very first RAG-based chatbot.
Conclusion
In this article, youāve learnt what RAG is, some use cases for RAG, and how to build your own RAG-based LLM application. However, you might have noticed that building your own RAG application is pretty complicated, and indexing your data is often a non-trivial task. Luckily, there are existing open-source frameworks like LangChain and lLamaIndex that allows you to implement what weāve demonstrated in a much simpler way.
If you like the article, donāt forget to give us a star on Github ā¤ļø: https://github.com/confident-ai/deepeval
You can also find the full code example here: https://github.com/confident-ai/blog-examples/tree/main/rag-llm-app
Till next time!
Posted on October 25, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.