Speak Your Queries: How Langchain Lets You Chat with Your Database
Ngonidzashe Nzenze
Posted on May 1, 2023
Imagine effortlessly conversing with your database as if it were a close friend, asking questions and receiving instant, accurate responses. Welcome to the world of Langchain, a groundbreaking package that harnesses the power of LLMs(Large Language Models) to redefine the way you interact with your databases. In this tutorial, we'll explore how to seamlessly connect to a PostgreSQL database and start chatting with it using Langchain. Say goodbye to complex queries and embrace the future of database management – let's dive into the realm of conversational AI and revolutionize your data-driven tasks today!
What is Langchain?
Langchain is a user-friendly tool that helps connect advanced language software, called large language models (LLMs), with other sources of information, like databases. LLMs are amazing at understanding and generating human-like text, but to create truly powerful apps, we need to combine their abilities with other resources. Langchain makes this possible by allowing developers to easily link these language models with external data. This means that everyday people can benefit from smarter, more helpful applications that understand their needs and provide accurate information.
Getting started
I have included all the code for this project on my github.
Lets start by installing all the required packages first. Install langchain
, openai
, psycopg2
and python-environ
with pip:
pip install langchain
pip install openai
pip install psycopg2
pip install python-environ
We are going to be using a postgresql database in this tutorial hence we're installing psycopg2
. I am already assuming that you have postgresql installed on your device and have an open ai account. We are going to use the python-environ
module to manage the API keys and database password.
Create a .env
file and add the keys into it as below:
apikey=your_openai_api_key
dbpass=your_database_password
Setting up the database
Create a database named tasks in pgAdmin. Our tasks are going to consist the task name, the completion status which is either 1 or 0, the due date, completion date, and the task priority ranging from 0 to 3.
Create a file named db.py
and add the following code to it:
import psycopg2
import environ
env = environ.Env()
environ.Env.read_env()
# Establish a connection to the PostgreSQL database
conn = psycopg2.connect(
host='localhost',
port=5433,
user='postgres',
password=env('dbpass'),
database='tasks'
)
# Create a cursor object to execute SQL commands
cursor = conn.cursor()
# Create the tasks table if it doesn't exist
cursor.execute('''CREATE TABLE IF NOT EXISTS tasks
(id SERIAL PRIMARY KEY,
task TEXT NOT NULL,
completed INTEGER,
due_date DATE,
completion_date DATE,
priority INTEGER)''')
# Insert sample tasks into the tasks table
cursor.execute("INSERT INTO tasks (task, completed, due_date, completion_date, priority) VALUES (%s, %s, %s, %s, %s)",
('Finish homework', 1, '2023-05-01', None, 1))
cursor.execute("INSERT INTO tasks (task, completed, due_date, completion_date, priority) VALUES (%s, %s, %s, %s, %s)",
('Buy groceries', 0, '2023-05-03', None, 2))
cursor.execute("INSERT INTO tasks (task, completed, due_date, completion_date, priority) VALUES (%s, %s, %s, %s, %s)",
('Pay bills', 0, '2023-05-05', None, 3))
cursor.execute("INSERT INTO tasks (task, completed, due_date, completion_date, priority) VALUES (%s, %s, %s, %s, %s)",
('Clean house', 0, '2023-05-08', None, 4))
cursor.execute("INSERT INTO tasks (task, completed, due_date, completion_date, priority) VALUES (%s, %s, %s, %s, %s)",
('Exercise', 1, '2023-05-10', None, 5))
# Commit the changes and close the connection
conn.commit()
conn.close()
This file is going to create a table named tasks and insert demo tasks into it. Run the script with python db.py
.
Setting up the chat
Now we are going to create the main console application. First create a file named main.py
and import the relevant modules:
#main.py
from langchain import SQLDatabase, SQLDatabaseChain
from langchain.chat_models import ChatOpenAI
# Setting up the api key
import environ
env = environ.Env()
environ.Env.read_env()
API_KEY = env('apikey')
Under the hood, LangChain works with SQLAlchemy to connect to various types of databases. This means it can work with many popular databases, like MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, and SQLite. To learn more about connecting LangChain to your specific database, you can check the SQLAlchemy documentation for helpful information and requirements.
With postgres
and psycopg2
, the connection string is like this: postgresql+psycopg2://username:password@localhost/mydatabase
.
Add the following to main.py
:
# main.py
# ...
# Setup database
db = SQLDatabase.from_uri(
f"postgresql+psycopg2://postgres:{env('dbpass')}@localhost:5433/tasks",
)
The code above sets up our connection to the postgresql database. Feel free to change it to your own configuration.
Add the following code:
# main.py
# ...
# setup llm
llm = ChatOpenAI(temperature=0, openai_api_key=API_KEY, model_name='gpt-3.5-turbo')
# Create query instruction
QUERY = """
Given an input question, first create a syntactically correct postgresql query to run, then look at the results of the query and return the answer.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
{question}
"""
# Setup the database chain
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
First we define our language model with:
llm = ChatOpenAI(temperature=0, openai_api_key=API_KEY)
We are going to be using the open ai language model.
Next, we create a basic query string, which will tell the model instructions on how we want it to present the data.
QUERY = """
Given an input question, first create a syntactically correct postgresql query to run, then look at the results of the query and return the answer.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
{question}
"""
Finally we, set up the database chain with:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
We want to be able to prompt our database multiple times therefore I'm going to setup a while loop:
# main.py
# ...
def get_prompt():
print("Type 'exit' to quit")
while True:
prompt = input("Enter a prompt: ")
if prompt.lower() == 'exit':
print('Exiting...')
break
else:
try:
question = QUERY.format(question=prompt)
print(db_chain.run(question))
except Exception as e:
print(e)
get_prompt()
The get_prompt
function will keep on asking for input and giving us the result over and over. We can only exit the loop if we type 'exit'. That is it for the setup, let's test it out and see how it does.
Now let's try it out!
Wait a minute🤔, when should I have bought groceries?
How many incomplete tasks do I even have?
Oh I had forgot! I need to clean the house!!😨
In conclusion, we've explored the incredible potential of LangChain to revolutionize the way we interact with databases. By harnessing the power of large language models and making it easy to connect with various databases, LangChain enables users to engage in natural, conversational exchanges with their data. The accompanying screenshots throughout this article have demonstrated the simplicity and effectiveness of this innovative approach. As technology continues to evolve, LangChain stands as a testament to the endless possibilities that emerge when we seamlessly blend AI capabilities with practical, real-world applications. So, embrace the future of database management with LangChain and experience the ease and efficiency of conversing with your data.
Posted on May 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.