Alfred Nutile
Posted on June 24, 2024
Make sure to follow me on YouTube
This post will show how easy it is to get going with Laravel, Vectorized data and LLM chat. It can be the foundation to a RAG system. There are links to the code and more.
Retrieval augmented generation system — an architectural approach that can improve the efficacy of large language model (LLM) applications
The Repo is here. The main branch will get you going on a fresh install of Laravel. If you copy the .env.example to .env you can get started just follow along.
Follow Along or Watch the Video COMING SOON!
Step 1 Setting up Vector
NOTE: Each pull do composer install. If that is not enough run composer dump
You can see the Branch here https://github.com/alnutile/laravelrag/tree/vector_setup
Once you have Laravel Setup using HERD, DBEngine, or the PostGres app. Then using TablePlus or Command Line or whatever make the database in this example “laravelrag”
Now we want to are going to install this library which will setup the Vector Extension for us.
composer require pgvector/pgvector
php artisan vendor:publish --tag="pgvector-migrations"
php artisan migrate
Step 2 Now for the Model
The Branch is https://github.com/alnutile/laravelrag/tree/model_setup
We will keep it simple here and have a model named Chunk
This will be where we store “chunks” of a document. In our example we will chunk up a long Text document so we can keep it simple for now. But in the end all things become text! PDF, PPT, Docx etc.
You will see in the code it is not about pages as much as chunks that are x size with x overlap of content.
In this code we will default to 600 characters chunks with an bit of an overlap you can see the code here.
Step 3 Add the LLM Drivers
Repo Link is https://github.com/alnutile/laravelrag/tree/llm_driver
NOTE: Each pull do composer install. If that is not enough run composer dump
We bring in my LLMDriver folder which is not an official package (sorry just too lazy) and then some other libraries
composer require spatie/laravel-data laravel/pennant
I am going to use my LLM Driver for this and then plug in Ollama and later Claude.
But first get Ollama going on your machine read about it here.
We are going to pull llama3 and mxbai-embed-large (for embedding data)
Or just use your API creds for OpenAi it should make sense when you see the code and the config file “config/llmdriver.php”
Just set the Key value in your .env or checkout config/llmdriver.php
for more options.
LLM_DRIVER=ollama
Now let’s open TinkerWell (I want to avoid coding a UI so we can focus on the concept more) https://tinkerwell.app/
Load up the Provider bootstrap/providers.php
<?php
return [
App\Providers\AppServiceProvider::class,
\App\Services\LlmServices\LlmServiceProvider::class,
];
Ok so we see it is working now lets chunk a document.
NOTE: This ideally all would run in Horizon or Queue jobs to deal with a ton of details like timeouts and more. We will see what happens if we just go at it this way for this demo.
Also keep an eye on the tests folders I have some “good” examples on how to test your LLM centric applications like tests/Feature/ChunkTextTest.php
Ok now we run the command to Embed the data
And now we have a ton of chunked data!
The columns are for the different size embeddings depending on the embed models you are using. I got some feedback here and went the route you see above.
Now lets chat with the data!
Step 4 Chatting with your Data
Ok we want the user to ask the LLM a question, but the LLM needs “context” and a Prompt that reduces drift, that is when an LLM makes up answers. I have seen it reduced to 100% in these systems.
First let’s vectorize the input so we can search for related data.
Since we embedded the data the question then gets embedded or vectorized to then use it to do the search.
So we take the text question and pass it to the embed api (Ollama and OpenAi offer this)
Here is the code so you can see how simple it really is with HTTP.
You will see I use Laravel Data from Spatie so not matter the LLM service it is always the same type of date in and out!
Now we use the Distance query to do a few things lets break it down
We take the results of the embedData gives us and pass it into the query using Vector to format it Pgvector\Laravel\Vector library:
use Pgvector\Laravel\Vector;
new Vector($value);
Then we use that in the distance query
I used Cosine since I feel the results had been a bit better. Why? I did a bit of ChatGPT work to decide which one and why. Here are some results:
The order of effectiveness for similarity metrics can vary depending on the nature of the data and the specific use case. However, here’s a general guideline based on common scenarios:
Cosine Similarity: Cosine similarity is often considered one of the most effective metrics for measuring similarity between documents, especially when dealing with high-dimensional data like text documents. It’s robust to differences in document length and is effective at capturing semantic similarity.
Inner Product: Inner product similarity is another metric that can be effective, particularly for certain types of data. It measures the alignment between vectors, which can be useful in contexts where the direction of the vectors is important.
L2 (Euclidean) Distance: L2 distance is a straightforward metric that measures the straight-line distance between vectors. While it’s commonly used and easy to understand, it may not always be the most effective for capturing complex relationships between documents, especially in high-dimensional spaces.
In summary, the order of effectiveness is typically Cosine Similarity > Inner Product > L2 Distance. However, it’s important to consider the specific characteristics of your data and experiment with different metrics to determine which one works best for your particular application.
Ok back to the example. So now we have our question vectorized and we have search results. The code also takes a moment to knit back the chunks with siblings so instead of getting just the chunk we get the chunk before and after. https://github.com/alnutile/laravelrag/blob/chat_with_data/app/Services/LlmServices/DistanceQuery.php#L33
Now that we have the results we are going to build a prompt. This is tricky since it takes time to get it right so you might want to pull it into ChatGPT or Ollama and mess around a bit. The key here is setting the temperature to 0 to keep the system from drifting. That is not easy yet in Ollama https://github.com/ollama/ollama/issues/2505
![results(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oulye3gg5rtfbh13jzap.png)
Ok let’s break this down.
Item 1 the Prompt
Here we define a Role, Task and Format (JSON, Markdown, Table etc) checkout https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/ for some tips!
Item 2 the Input
Here we pass the original text question to help the LLM understand the users request
Garbage in Garbage Out
Item 3 the Context
Context, this is key. “Garbage in Garbage Out” is still the rule here as well. Put good data into your RAG system. In this example I imported some Laravel docs. But this is the data from the Distance Query!
Item 4 LLM
We are just doing a completion api here. This is not a “Chat” which is an array of questions and answers but that would work as well. This is just the Prompt we built passed in. I am using Claude driver to show how easily we can switch systems. Also I feel like Ollama, unless you set the temperature, is a bit trickier right now to keep on track. And Claude is FAST!
Item 5 The Answer
As seen above you can get a sense of the answer but just want to share that sometimes the LLM will point out (see below) that it does not have enough data to answer your question.
Here is another example I like to share with people
Just more “evidence” of what a good RAG system can do.
Wrapping it up
That is really how “easy” it is to get a RAG system going. LaraLlama.io has a ton more details that you can see but this is a very simple code base I share in this article.
The next post will be tools/functions, extending this code further. But there are so many ways to use this in applications, I list a bunch of use cases here https://docs.larallama.io/use-cases.html
The code is all here https://github.com/alnutile/laravelrag you can work through the branches with the last one being https://github.com/alnutile/laravelrag/tree/chat_with_data
Make sure to follow me on YouTube https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9
And the list below or more ways to stay in touch!
📺 YouTube Channel — https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9
📖 The Docs — https://docs.larallama.io/
🚀 The Site — https://www.larallama.io
🧑🏻💻 The Code — https://github.com/LlmLaraHub/laralamma
n📰 The NewsLetter — https://sundance-solutions.mailcoach.app/larallama-app
🖊️ Medium — https://medium.com/@alnutile
🤝🏻 LinkedIn — https://www.linkedin.com/in/alfrednutile/
📺 YouTube Playlist — https://www.youtube.com/watch?v=KM7AyRHx0jQ&list=PLL8JVuiFkO9I1pGpOfrl-A8-09xut-fDq
💬 Discussions — https://github.com/orgs/LlmLaraHub/discussions
Posted on June 24, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.