Quick tip: SingleStoreDB integration with LangChain

veryfatboy

Akmal Chaudhri

Posted on June 29, 2023

Quick tip: SingleStoreDB integration with LangChain

Abstract

Recently, SingleStoreDB has been integrated with LangChain. In this short article, we'll walk through a quick example to demonstrate the integration and how easy it is to use these two technologies together.

The notebook file used in this article is available on GitHub.

Introduction

LangChain is a software development framework designed to simplify the creation of applications using Large Language Models (LLMs). In this short article, we'll streamline the example described in a previous article developed before the SingleStoreDB LangChain integration was announced, and show how easy it is to use SingleStoreDB and LangChain together.

As described in the previous article, we'll follow the instructions to create a SingleStoreDB Cloud account, Workspace Group, Workspace, and Notebook.

Fill out the Notebook

First, we'll install some libraries:

!pip install langchain --quiet
!pip install langchain-community --quiet
!pip install langchain-openai --quiet
!pip install nltk --quiet
!pip install openai --quiet
!pip install pdf2image --quiet
!pip install pdfminer.six --quiet
!pip install unstructured==0.10.14 --quiet
Enter fullscreen mode Exit fullscreen mode

Next, we'll read in a PDF document. This is an article by Neal Leavitt titled "Whatever Happened to Object-Oriented Databases?" OODBs were an emerging technology during the late 1980s and early 1990s. We'll add leavcom.com to the firewall when prompted. Once the address has been added to the firewall, we'll read the PDF file:

loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")

data = loader.load()
Enter fullscreen mode Exit fullscreen mode

We can use LangChain's OnlinePDFLoader, which makes reading a PDF file easier.

Next, we'll get some data on the document:

print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")
Enter fullscreen mode Exit fullscreen mode

The output should be:

You have 1 document(s) in your data
There are 13040 characters in your document
Enter fullscreen mode Exit fullscreen mode

We'll now split the document into pages containing 2,000 characters each:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 2000,
    chunk_overlap = 20
)
texts = text_splitter.split_documents(data)

print (f"You have {len(texts)} pages")
Enter fullscreen mode Exit fullscreen mode

Next, we'll set our OpenAI API Key:

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
Enter fullscreen mode Exit fullscreen mode

and use LangChain's OpenAIEmbeddings, then store the text with the vector embeddings in the database system. This is much simpler using the LangChain integration:

embedding = OpenAIEmbeddings(model = "text-embedding-3-small")

docsearch = SingleStoreDB.from_documents(
    texts,
    embedding,
    table_name = "pdf_docs",
    distance_strategy = "DOT_PRODUCT",
)
Enter fullscreen mode Exit fullscreen mode

We can now ask a question, as follows:

query_text = "Will object-oriented databases be commercially successful?"

docs = docsearch.similarity_search(query_text)

print(docs[0].page_content)
Enter fullscreen mode Exit fullscreen mode

The integration again shows its power and ease of use.

Finally, we can use a GPT to provide an answer, based on the earlier question:

client = OpenAI()

prompt = f"The user asked: {query_text}. The most similar text from the document is: {docs[0].page_content}"

response = client.chat.completions.create(
    model = "gpt-4o-mini",
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Here is some example output:

While object-oriented databases are still in use and have solid niche markets,
they have not gained as much commercial success as relational databases.
Observers previously anticipated that OO databases would surpass relational
databases, especially with the emergence of multimedia data on the internet,
but this prediction did not come to fruition. However, OO databases continue
to be used in specific fields, such as CAD and telecommunications. Experts
have varying opinions on the future of OO databases, with some predicting
further decline and others seeing potential growth.
Enter fullscreen mode Exit fullscreen mode

Summary

Comparing our solution in this article with the previous one, we can see that the LangChain integration provides a simpler solution. The framework abstracted the database access allowing us to focus on the business problem and providing a compelling, time-saving solution.

💖 💪 🙅 🚩
veryfatboy
Akmal Chaudhri

Posted on June 29, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related