Creating a Knowledge-Based Chatbot with OpenAI Embedding API, Pinecone, and Langchain.js
Mael Kerichard
Posted on April 9, 2023
In this tutorial, we'll walk you through the process of creating a knowledge-based chatbot using the OpenAI Embedding API, Pinecone as a vector database, and langchain.js as a large language model (LLM) framework. This chatbot will be able to accept URLs, which it will use to gain knowledge from and provide answers based on that knowledge.
Introduction
To create our knowledge-based chatbot, we will use the following technologies:
- Pinecone: A vector database that helps us store and query embeddings.
- OpenAI Embedding API: An API that provides embeddings for text inputs.
- langchain.js: A JavaScript library for LLM frameworks that makes it easier to work with Pinecone and OpenAI.
What is a Vector Database?
Vector databases are used to store and query vectors efficiently. They allow you to search for similar vectors based on their similarity in a high-dimensional space. In this tutorial, we will use Pinecone as our vector database.
Embeddings are dense vector representations of data, such as text, images, or audio. In our case, we'll be using text embeddings generated by the OpenAI Embedding API. These embeddings help us find semantically similar content in the vector space.
Prerequisites
Before we dive into the details, make sure you have the following:
- Familiarity with JavaScript, TypeScript, and Svelte
- Basic understanding of web scraping and natural language processing (NLP)
- Node.js and NPM installed on your system
- An OpenAI API key
- A Pinecone API key
Overview
Our chatbot will consist of the following components:
- Web scraper: to extract content from given URLs
- Text splitter: to split the acquired text into chunks for processing
- Embedding generator: to create embeddings from the text chunks using OpenAI's Embedding API
- Pinecone vector store: to store and retrieve embeddings efficiently
- Langchain LLM: to provide question-answering capabilities based on the embeddings
Let's dive into each component.
1. Web Scraper
To build the web scraper, we will use Puppeteer, a Node.js library that provides a high-level API to control headless Chrome or Chromium browsers.
async function scrape_researchr_page(url: string, browser: Browser): Promise<string> {
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.goto(url);
const element = await page.waitForSelector('#content > div.row > div', {
timeout: 100
});
if (!element) {
throw new Error('Could not find element');
}
// keep only content elements (like p, h1, h2, h3, h4, h5, h6, li, blockquote, pre, code, table, dl, div)
await element.evaluate((element) => {
const elements = element.querySelectorAll(
'*:not(p, h1, h2, h3, h4, h5, h6, li, blockquote, pre, code, table, dl, div)'
);
for (let i = 0; i < elements.length; i++) {
elements[i].parentNode?.removeChild(elements[i]);
}
});
const html_of_element = await element.evaluate((element) => element.innerHTML);
return turndownService.turndown(html_of_element);
}
Of course, you will need to adapt the scrapper for your pages.
At the end of the function, we use the turndown library. It is a library that allows to convert HTML into Markdown. We have added that step because it will be later easier to split and GPT have a better understanding of Markdown than HTML. It is also a lighter syntax than HTML, meaning less tokens, thus cheaper API calls.
2. Text Splitter
The next step is to split the acquired text into smaller chunks for processing. We will use the MarkdownTextSplitter
class from the langchain/text_splitter
package. This class takes a chunkSize and chunkOverlap parameter to control the size and overlap of the generated text chunks.
const textSplitter = new MarkdownTextSplitter({
chunkSize: 1000,
chunkOverlap: 20
});
const chunks = await textSplitter.splitText(markdowns.join('\n\n'));
Ideally, we want one information per chunk. If you have very structured markdown files, one chunk could be equal to one subsection. In our case, the markdown comes from HTML and is badly structured, we then really on fixed chunk size, making our knowledge base less reliable (one information could be split into two chunks).
3. Embedding Generator
We will use OpenAI's Embedding API to generate embeddings for our text chunks. We will create an instance of the OpenAIEmbeddings
class from the langchain/embeddings
package and pass it our OpenAI API key via the ENV OPENAI_API_KEY=<your_openai_api_key>
.
const embeddingModel = new OpenAIEmbeddings({ maxConcurrency: 5 });
4. Pinecone Vector Store
Pinecone is a vector database designed for efficient storage and retrieval of high-dimensional vectors. We will use the PineconeStore
class from the langchain/vectorstores
package to store our generated embeddings.
We first initialize the client and connect to the index created on Pinecone dashboard (the vectors have 1536 dimensions).
import {PineconeClient} from "@pinecone-database/pinecone";
const client = new PineconeClient();
await client.init({
apiKey: PINECONE_API_KEY,
environment: PINECONE_ENVIRONMENT,
});
export const pineconeIndex = client.Index(PINECONE_INDEX);
This code will get embeddings from the OpenAI API and store them in Pinecone.
5. Langchain
To provide question-answering capabilities based on our embeddings, we will use the VectorDBQAChain
class from the langchain/chains
package. This class combines a Large Language Model (LLM) with a vector database to answer questions based on the content in the vector database.
In our example, we use the ChatOpenAI
class from the langchain/chat_models
package as our LLM, and the PineconeStore
instance as our vector database.
const model = new ChatOpenAI({ temperature: 0.9, openAIApiKey: OPENAI_API_KEY, modelName: 'gpt-3.5-turbo' });
const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
k: 5,
returnSourceDocuments: true
});
With these components in place, we can now create a SvelteKit endpoint to handle user requests. The POST
request handler will receive the user's query, call the VectorDBQAChain
, and return the chatbot's response along with the source document (if available).
export const POST = (async ({ request }) => {
const { text } = await request.json();
if (!text) {
throw error(400, 'Missing text');
}
if (text.length > 200) {
throw error(400, 'Text too long');
}
try {
const response = await chain.call({ query: text });
const { text: responseText, sourceDocuments } = response;
return json({
text: responseText,
sources: sourceDocuments
});
} catch (e) {
console.log(e);
throw error(500, 'Internal Server Error');
}
}) satisfies RequestHandler;
Conclusion
We've demonstrated how to build a knowledge-based chatbot using the OpenAI Embedding API, Pinecone as a vector database, and Langchain. This chatbot can acquire knowledge from given URLs and answer user queries based on that knowledge.
By combining web scraping, text processing, embeddings, vector databases, and LLMs, you can create powerful chatbots that can learn and provide useful answers based on the content they consume.
Find the reference code on GitHub.
Posted on April 9, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.