PDF RAG Demo: Building Simplified AI Workflows with Couchbase Shell
Brian King
Posted on November 29, 2024
Previously, we showed how to use Couchbase RAG capabilities through a Python app that allows the user to ‘chat’ with their PDF or with X. It’s simple to build, but can we build it simpler? I have been playing a lot with Couchbase Shell recently and it should allow me to do something similar.
Set up a scope and collection
I am assuming you are already familiar with Couchbase Shell (cbsh), and have a configured cluster and model.
Create and select a scope and collection, and then create a primary index:
> scopes create pdf
> cb-env scope pdf
> collections create pdf
> cb-env collection pdf
> query "CREATE PRIMARY INDEX ON `default`:`cbsh`.`pdf`.`pdf`"
Turn a PDF into chunked text
There are a variety of tools allowing you to convert a pdf to text. On most Linux distributions, you should find pdftotext.
> pdftotext ~/monopolyInstruction.pdf
This will create a text version of the file with the same path, but with a .txt extension.
With Nushell (cbsh is based on Nushell) it’s easy to split text thanks to the split command. The problem is finding the right delimiter you need to chunk out the file. Fortunately, it supports multiline string, so I copied and pasted text from the file between two paragraphs. However, you should be able to do something more sophisticated using regex. That’s the difference between blog material and production .
> open ~/monopolyInstruction.txt |split row "
:::
::: "|wrap text
This will get you a table of text strings. To import it to Couchbase we wrap them in a text field, in a content JSON object, add a randomly generated UUID and upsert the result.
> open ~/monopolyInstruction.txt |split row "
:::
::: "|wrap text |wrap content | each { insert id { random uuid } } | doc upsert
The next step is to create embeddings, or vector representations of the text:
> query "SELECT meta().id as id, p.* from pdf as p" | wrap content| vector enrich-doc text | doc upsert
Then create the vector search index. Here it’s called pdf, index the field textVector, create 1536 dimensions vector and use l2_norm for similarity algorithm as it’s the default.
> vector create-index pdf textVector 1536
I have imported the rules of Monopoly and I am asking how to get out of jail. In the original example, we had one answer with context and one without.
> ask "how to get out of jail"
If you or someone you know is in jail and needs to be released, here are some general steps to take:
Contact a lawyer: If you have legal representation or know of a lawyer who can help with your case, reach out to them for assistance in navigating the legal process.
Obtain a bail bond: In many cases, individuals can be released from jail by posting bail. This requires paying a set amount of money to the court, which is typically returned once the individual attends all required court dates.
Attend court hearings: It's important to comply with all court requirements, including attending scheduled hearings and following any conditions of release set by the court.
Seek support: Consider reaching out to family members, friends, or local organizations that may be able to provide assistance or guidance during this challenging time.
Please keep in mind that the process of getting released from jail can vary depending on the specific circumstances of the case and the jurisdiction. It's always best to consult with legal professionals for personalized advice and assistance.
And with context:
> let question = "how to get out of jail"; vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question
![👤](https://s.w.org/images/core/emoji/15.0.3/72x72/1f464.png) Laurent Doguin ![🏠](https://s.w.org/images/core/emoji/15.0.3/72x72/1f3e0.png) capella in ![☁](https://s.w.org/images/core/emoji/15.0.3/72x72/2601.png) cbsh.gitlog.pdf
> let question = "how to get out of jail"; vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question
Embedding batch 1/1
You can get out of jail by following these methods:
**Roll Doubles:** If you roll a double with the white dice on any of your next three turns, you can immediately move out of Jail. You then move the number of spaces shown by your doubles roll.
**"Get Out of Jail Free" Card:** If you have a "Get Out of Jail Free" card, you can use it to get out of Jail without rolling doubles. This card can be obtained by purchasing it from another player or drawing it from the Chance or Community Chest cards.
**Pay Fine:** You can also choose to pay a fine of $50 before you roll the dice on either of your next two turns. After paying the fine, you are free to move and continue playing.
Remember, if you do not roll doubles by your third turn or use a "Get Out of Jail Free" card, you must pay the $50 fine to get out of Jail.
Let’s simplify this by putting everything in a script. This is the content of myScript.nu:
def initRAGPipeline [] {
scopes create pdf
cb-env scope pdf
collections create pdf
cb-env collection pdf
query "CREATE PRIMARY INDEX ON `default`:`cbsh`.`pdf`.`pdf`"
vector create-index pdf textVector 1536
}
def storeRAGDoc [] {
wrap text |wrap content | each { insert id { random uuid } } | doc upsert
query "SELECT meta().id as id, p.* from `pdf` as p" | wrap content| vector enrich-doc text | doc upsert
}
def myAsk [$question: string] {
let norag = ask $question
let rag = vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question
{"rag":$rag, "norag":$norag}
}
You can source the script file and then call those functions:
> source ./ragDemo/ragScript.nu
> initRAGPipeline
> open monopolyInstruction.txt |split row "
:::
::: "| store
> myAsk "how to get out of jail"
Embedding batch 1/1
╭───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ rag │ Here are the ways to get out of jail in the game of Monopoly: │
│ │ │
│ │ 1. **Roll Doubles:** The most common way to get out of jail is by rolling doubles on your turn. If you roll │
│ │ doubles with the regular white dice on any of your next three turns after being sent to jail, you can immediately move your token out of jail and advance the corresponding │
│ │ number of spaces. Remember that you can only use the white dice for this purpose. │
│ │ │
│ │ 2. **Using "Get Out of Jail Free" Card:** If you have a "Get Out of Jail Free" card, you can │
│ │ use it to get out of jail without rolling doubles. Simply present the card to the Banker to get out of jail for free. The card is then returned to the bottom of the deck. │
│ │ │
│ │ 3. │
│ │ **Purchase the Card:** If another player has a "Get Out of Jail Free" card and is willing to sell it, you can purchase the card from them at a mutually agreed-upon price. │
│ │ This allows you to get out of jail even if you don't have the card yourself. │
│ │ │
│ │ 4. **Pay the Fine:** If you do not roll doubles within three turns or do not have a "Get Out of │
│ │ Jail Free" card, you must pay a fine of $50 to the Bank before you roll the dice on either of your next two turns. Once you pay the fine, you are immediately released from │
│ │ jail and can move your token as per the dice roll. │
│ │ │
│ │ These are the four main ways to get out of jail in Monopoly. │
│ norag │ If you or someone you know is currently in jail and looking to get released, here are some general steps to consider: │
│ │ │
│ │ 1. Contact a lawyer: A criminal defense attorney can │
│ │ provide guidance on legal options and help navigate the legal process for release. │
│ │ │
│ │ 2. Attend court hearings: It is important to attend all court hearings and follow any │
│ │ conditions set by the court to demonstrate cooperation with the legal system. │
│ │ │
│ │ 3. Consider bail: If bail is an option, you may be able to pay a set amount to be released from │
│ │ jail pending trial. If you cannot afford the bail amount, you may seek assistance from a bail bond agent. │
│ │ │
│ │ 4. Seek alternative options: Depending on the circumstances of your │
│ │ case, there may be alternative options for release such as pretrial services, diversion programs, or supervised release. │
│ │ │
│ │ 5. Follow legal advice: It is crucial to follow the │
│ │ advice of your legal counsel and comply with all legal requirements to increase the chances of a successful release. │
│ │ │
│ │ It's important to note that the process of getting out of │
│ │ jail can vary depending on the specific circumstances of the case and the laws in your jurisdiction. For personalized guidance, it's recommended to speak with a lawyer or │
│ │ legal professional specializing in criminal law. │
╰───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Here, you can see the same kind of result we achieved in the Python RAG demo, but this time using Couchbase Shell. It should be easier to manipulate, change or extend, because you don’t need to deploy an app or know Python. However, it will be less flexible than what you can achieve with Python and Langchain.
If this interests you, stay tuned–more AI and Couchbase Shell content is on the way!
-
- Learn more about Couchbase Shell
- and Couchbase vector search capabilities
The post PDF RAG Demo: Building Simplified AI Workflows with Couchbase Shell appeared first on The Couchbase Blog.
Posted on November 29, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.