Jaydeep Biswas
Posted on January 10, 2024
Hey there! Throughout our latest blog series, we've delved into a wide array of subjects. Here's an overview of the topics we've explored thus far:
- Installation and Setup of LangChain
- LangChain's 1st Module: Model I/O
- LangChain's 2nd Module: Retrieval
Exploring LangChain's Agents ππ€
Today, I want to dive into this exciting concept called "Agents" ** in LangChain. It's pretty mind-blowing!
**LangChain introduces an innovative idea called "Agents" that takes the concept of chains to a whole new level. Agents use language models to dynamically figure out sequences of actions to perform, making them highly versatile and adaptable. Unlike regular chains, where actions are hardcoded in code, agents utilize language models as reasoning engines to decide which actions to take and in what order.
The Agent is the main part responsible for decision-making. It harnesses the power of a language model and a prompt to figure out the next steps to achieve a specific objective. The inputs to an agent usually include:
- Tools: Descriptions of available tools (more on this later).
- User Input: The high-level objective or query from the user.
- Intermediate Steps: A history of (action, tool output) pairs executed to reach the current user input.
The result of an agent can either be the next thing to do (AgentActions) or the ultimate reply to give to the user (AgentFinish). An action includes details about a tool and the input needed for that tool.
Tools π οΈ
Tools are interfaces that an agent can use to interact with the world. They allow agents to perform various tasks like searching the web, running shell commands, or accessing external APIs. In LangChain, tools are crucial for expanding the capabilities of agents and helping them achieve diverse tasks.
To use tools in LangChain, you can load them using the following code:
from langchain.agents import load_tools
tool_names = [...]
tools = load_tools(tool_names)
Some tools may need a base Language Model (LLM) for initialization. In such cases, you can pass an LLM like this:
from langchain.agents import load_tools
tool_names = [...]
llm = ...
tools = load_tools(tool_names, llm=llm)
This setup allows you to access a variety of tools and integrate them into your agent's workflows. The complete list of tools with usage documentation is available here.
Examples of Tools ππ§
DuckDuckGo
The DuckDuckGo tool lets you perform web searches using its search engine. Here's an example:
from langchain.tools import DuckDuckGoSearchRun
search = DuckDuckGoSearchRun()
search.run("Manchester United vs Luton Town match summary")
DataForSeo
The DataForSeo toolkit allows you to get search engine results using the DataForSeo API. To use it, you need to set up your API credentials:
import os
os.environ["DATAFORSEO_LOGIN"] = "<your_api_access_username>"
os.environ["DATAFORSEO_PASSWORD"] = "<your_api_access_password>"
Once credentials are set, you can create a DataForSeoAPIWrapper tool to access the API:
from langchain.utilities.dataforseo_api_search import DataForSeoAPIWrapper
wrapper = DataForSeoAPIWrapper()
result = wrapper.run("Weather in Los Angeles")
The DataForSeoAPIWrapper tool fetches search engine results from various sources.
You can customize the type of results and fields returned in the JSON response:
json_wrapper = DataForSeoAPIWrapper(
json_result_types=["organic", "knowledge_graph", "answer_box"],
json_result_fields=["type", "title", "description", "text"],
top_count=3,
)
json_result = json_wrapper.results("Bill Gates")
Specify the location and language for your search results:
customized_wrapper = DataForSeoAPIWrapper(
top_count=10,
json_result_types=["organic", "local_pack"],
json_result_fields=["title", "description", "type"],
params={"location_name": "Germany", "language_code": "en"},
)
customized_result = customized_wrapper.results("coffee near me")
Choose the search engine:
customized_wrapper = DataForSeoAPIWrapper(
top_count=10,
json_result_types=["organic", "local_pack"],
json_result_fields=["title", "description", "type"],
params={"location_name": "Germany", "language_code": "en", "se_name": "bing"},
)
customized_result = customized_wrapper.results("coffee near me")
The search is customized to use Bing as the search engine.
Specify the type of search:
maps_search = DataForSeoAPIWrapper(
top_count=10,
json_result_fields=["title", "value", "address", "rating", "type"],
params={
"location_coordinate": "52.512,13.36,12z",
"language_code": "en",
"se_type": "maps",
},
)
maps_search_result = maps_search.results("coffee near me")
These examples showcase how you can customize searches based on result types, fields, location, language, search engine, and search type.
Shell (bash)
The Shell toolkit gives agents the ability to interact with the shell environment, allowing them to run shell commands. This feature is powerful but should be used carefully, especially in sandboxed environments. Here's how to use the Shell tool:
from langchain.tools import ShellTool
shell_tool = ShellTool()
result = shell_tool.run({"commands": ["echo 'Hello World!'", "time"]})
In this example, the Shell tool runs two shell commands: echoing "Hello World!" and displaying the current time.
You can provide the Shell tool to an agent for more complex tasks. Here's an example of an agent using the Shell tool to fetch links from a web page:
from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0.1)
shell_tool.description = shell_tool.description + f"args {shell_tool.args}".replace(
"{", "{{"
).replace("}", "}}")
self_ask_with_search = initialize_agent(
[shell_tool], llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
self_ask_with_search.run(
"Download the langchain.com webpage and grep for all urls. Return only a sorted list of them. Be sure to use double quotes."
)
In this scenario, the agent uses the Shell tool to execute a series of commands to fetch, filter, and sort URLs from a web page.
The examples provided showcase some of the tools available in LangChain. These tools ultimately expand the capabilities of agents (explored in the next subsection) and empower them to efficiently perform various tasks. Depending on your project's needs, you can choose the tools and toolkits that best suit your requirements and integrate them into your agent's workflows.
Return to Agents β©οΈπ€
Let's talk about agents now.
The AgentExecutor is like the engine that runs an agent. It's responsible for calling the agent, making it do actions, giving the agent the results, and doing this in a loop until the agent finishes its task. In simpler terms, it might look something like this:
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
The AgentExecutor deals with various complexities, like what happens when the agent picks a tool that doesn't exist, handling tool errors, managing what the agent produces, and providing logs at different levels.
Although the AgentExecutor class is the main runtime for agents in LangChain, there are other experimental runtimes like:
- Plan-and-execute Agent
- Baby AGI
- Auto GPT
To understand the agent framework better, let's build a basic agent from scratch and then explore pre-built agents.
Before we dive into building the agent, let's review some key terms and schema:
-
AgentAction: This is like a set of instructions for the agent. It includes the
tool
to use andtool_input
the input for that tool. - AgentFinish: This indicates the agent has finished its task and is ready to give a response to the user. - Intermediate Steps: These are like records of what the agent did before. They help the agent remember context for future actions.
Now, let's create a simple agent using OpenAI Function Calling. We'll start by making a tool that calculates word length. This is useful because language models sometimes make mistakes when counting word lengths due to tokenization.
First, load the language model:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
Test the model with a word length calculation::
llm.invoke("how many letters in the word educa?")
Define a simple function to calculate word length:
from langchain.agents import tool
@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)
We've created a tool named get_word_length that takes a word as input and returns its length.
Now, create a prompt for the agent. The prompt guides the agent on how to reason and format the output:
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a very powerful assistant but not great at calculating word lengths.",
),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
To provide tools to the agent, format them as OpenAI function calls:
from langchain.tools.render import format_tool_to_openai_function
llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])
Create the agent by defining input mappings and connecting components:
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| llm_with_tools
| OpenAIFunctionsAgentOutputParser()
)
We've created our agent, which understands user input, uses available tools, and formats output.
Interact with the agent:
agent.invoke({"input": "how many letters in the word educa?", "intermediate_steps": []})
Now, let's write a runtime for the agent. The simplest runtime calls the agent, executes actions, and repeats until the agent finishes:
from langchain.schema.agent import AgentFinish
user_input = "how many letters in the word educa?"
intermediate_steps = []
while True:
output = agent.invoke(
{
"input": user_input,
"intermediate_steps": intermediate_steps,
}
)
if isinstance(output, AgentFinish):
final_result = output.return_values["output"]
break
else:
print(f"TOOL NAME: {output.tool}")
print(f"TOOL INPUT: {output.tool_input}")
tool = {"get_word_length": get_word_length}[output.tool]
observation = tool.run(output.tool_input)
intermediate_steps.append((output, observation))
print(final_result)
To simplify this, use the AgentExecutor class. It encapsulates agent execution and offers error handling, early stopping, tracing, and other improvements:
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "how many letters in the word educa?"})
The AgentExecutor makes it easier to interact with the agent and simplifies the execution process.
Memory in Agents π§ π€
The agent we've made so far doesn't remember past conversations, making it stateless. To enable follow-up questions and continuous conversations, we need to add memory to the agent. Here are the two steps involved:
- Add a memory variable in the prompt to store chat history.
- Keep track of the chat history during interactions.
Let's start by adding a memory placeholder in the prompt:
from langchain.prompts import MessagesPlaceholder
MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a very powerful assistant but not great at calculating word lengths.",
),
MessagesPlaceholder(variable_name=MEMORY_KEY),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
Now, create a list to track the chat history:
from langchain.schema.messages import HumanMessage, AIMessage
chat_history = []
In the agent creation step, include the memory as well:
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
"chat_history": lambda x: x["chat_history"],
}
| prompt
| llm_with_tools
| OpenAIFunctionsAgentOutputParser()
)
When running the agent, make sure to update the chat history:
input1 = "how many letters in the word educa?"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend([
HumanMessage(content=input1),
AIMessage(content=result["output"]),
])
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})
This lets the agent maintain a conversation history and answer follow-up questions based on past interactions.
Congratulations! You've successfully created and executed your first end-to-end agent in LangChain. To explore LangChain's capabilities further, you can delve into:
- Different agent types supported.
- Pre-built Agents
- How to work with tools and tool integrations.
Agent Types π€π
LangChain offers various agent types, each suited for specific use cases. Here are some available agents:
- Zero-shot ReAct: Chooses tools based on their descriptions using the ReAct framework. Versatile and requires tool descriptions.
- Structured input ReAct: Handles multi-input tools, suitable for tasks like web browsing. Uses a tools' argument schema for structured input.
- OpenAI Functions: Designed for models fine-tuned for function calling, compatible with models like gpt-3.5-turbo-0613 and gpt-4-0613.
- Conversational: Tailored for conversational settings, uses ReAct for tool selection, and employs memory to remember previous interactions.
- Self-ask with search: Relying on a single tool, "Intermediate Answer," it looks up factual answers to questions.
- ReAct document store: Interacts with a document store using the ReAct framework, requiring "Search" and "Lookup" tools.
Explore these agent types to find the one that best suits your needs in LangChain. These agents allow you to bind a set of tools within them to handle actions and generate responses.
Prebuilt Agents π€π οΈ
Let's continue our exploration of agents, focusing on prebuilt agents available in LangChain.
LangChain Gmail Toolkit π§π§
LangChain provides a convenient toolkit for Gmail, allowing you to connect your LangChain email to the Gmail API. To get started, follow these steps:
-
Set Up Credentials:
- Download the credentials.json file as explained in the Gmail API documentation.
- Install required libraries using the following commands:
pip install --upgrade google-api-python-client pip install --upgrade google-auth-oauthlib pip install --upgrade google-auth-httplib2 pip install beautifulsoup4 # Optional for parsing HTML messages
-
Create Gmail Toolkit:
- Initialize the toolkit with default settings:
from langchain.agents.agent_toolkits import GmailToolkit toolkit = GmailToolkit()
-
Customize authentication as needed. Behind the scenes, a googleapi resource is created using the following methods:
from langchain.tools.gmail.utils import build_resource_service, get_gmail_credentials credentials = get_gmail_credentials( token_file="token.json", scopes=["https://mail.google.com/"], client_secrets_file="credentials.json", ) api_resource = build_resource_service(credentials=credentials) toolkit = GmailToolkit(api_resource=api_resource)
-
Use Toolkit Tools:
- The toolkit offers various tools such as
GmailCreateDraft
,GmailSendMessage
,GmailSearch
,GmailGetMessage
, andGmailGetThread
.
- The toolkit offers various tools such as
_GmailCreateDraft_: Create a draft email with specified message fields.
_GmailSendMessage_: Send email messages.
_GmailSearch_: Search for email messages or threads.
_GmailGetMessage_: Fetch an email by message ID.
_GmailGetThread_: Search for email messages.
-
Initialize Agent:
- Initialize the agent with the toolkit and other settings:
from langchain.llms import OpenAI from langchain.agents import initialize_agent, AgentType llm = OpenAI(temperature=0) agent = initialize_agent( tools=toolkit.get_tools(), llm=llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, )
-
Examples:
- Create a Gmail draft for editing:
agent.run("Create a Gmail draft for me to edit...")
-
Search for the latest email in your drafts:
agent.run("Could you search in my drafts for the latest email?")
These examples demonstrate LangChain's Gmail toolkit capabilities, enabling programmatic interactions with Gmail.
SQL Database Agent ππ€
This agent interacts with SQL databases, particularly the Chinook database. Be cautious as it is still in development. To use:
- Initialize Agent:
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI
db = SQLDatabase.from_uri("sqlite:///../../../../../notebooks/Chinook.db")
toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))
agent_executor = create_sql_agent(
llm=OpenAI(temperature=0),
toolkit=toolkit,
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)
_Disclaimer_
- The query chain may generate insert/update/delete queries. Be cautious, and use a custom prompt or create a SQL user without write permissions if needed.
- Be aware that running certain queries, such as "run the biggest query possible," could overload your SQL database, especially if it contains millions of rows.
- Data warehouse-oriented databases often support user-level quotas to limit resource usage.
-
Examples:
- Describe a table:
agent_executor.run("Describe the playlisttrack table")
-
Run a query:
agent_executor.run("List the total sales per country. Which country's customers spent the most?")
The agent will execute the query and provide the result, such as the country with the highest total sales.
To get the total number of tracks in each playlist, you can use the following query:
```
agent_executor.run("Show the total number of tracks in each playlist. The Playlist name should be included in the result.")
```
The agent will return the playlist names along with the corresponding total track counts.
-
Caution:
- Be cautious about running certain queries that could overload your database.
Pandas DataFrame Agent πΌππ€
This agent interacts with Pandas DataFrames for question-answering purposes. Use with caution to prevent potential harm from generated Python code:
- Initialize Agent:
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
from langchain.llms import OpenAI
import pandas as pd
df = pd.read_csv("titanic.csv")
agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)
-
Examples:
- Count rows in the DataFrame:
agent.run("how many rows are there?")
-
Filter rows based on criteria:
agent.run("how many people have more than 3 siblings")
Jira Toolkit π π§
The Jira toolkit allows agents to interact with a Jira instance. Follow these steps:
- Install Libraries and Set Environment Variables:
%pip install atlassian-python-api
import os
from langchain.agents import AgentType
from langchain.agents import initialize_agent
from langchain.agents.agent_toolkits.jira.toolkit import JiraToolkit
from langchain.llms import OpenAI
from langchain.utilities.jira import JiraAPIWrapper
os.environ["JIRA_API_TOKEN"] = "abc"
os.environ["JIRA_USERNAME"] = "123"
os.environ["JIRA_INSTANCE_URL"] = "https://jira.atlassian.com"
os.environ["OPENAI_API_KEY"] = "xyz"
llm = OpenAI(temperature=0)
jira = JiraAPIWrapper()
toolkit = JiraToolkit.from_jira_api_wrapper(jira)
agent = initialize_agent(
toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
-
Examples:
- Create a new issue in a project:
agent.run("make a new issue in project PW to remind me to make more fried rice")
Now, you can interact with your Jira instance using natural language instructions and the Jira toolkit.
Posted on January 10, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.