A gentle introduction to 🦜️🔗Langchain

hiimivantang

hiimivantang

Posted on July 26, 2023

A gentle introduction to 🦜️🔗Langchain

It's amazing how Langchain became so popular overnight (figuratively). In this post I explore what it does and why it became so hot.

Inception

Langchain was open-sourced by Harrison Chase in October 2022 and in mere span of 9 months, it has gained a mind boggling 56.2K stars on its Github repo. To put the skyrocketing growth into perspective, it took Keras 101 months (8 years) to gain 58.9k stars.

Image description

👆 The growth looks ridiculous.

🤔 What exactly is Langchain?

It is a Python framework designed to simplify the creation of applications using large language models (LLMs). Essentially, it comes with with powerful building blocks and integrations that you can use to build your own LLM application. However, simplify seems to be a understatement here.

Getting started

Getting started is easy. Install Langchain with a simple pip install.

pip install langchain
Enter fullscreen mode Exit fullscreen mode

But wait, if you are using Python versions< 3.9 you probably will get clang compiler errors as such:

Image description

For me, I like to use pyenv to switch Python versions.

Use Python version 3.9 to avoid the above errors when installing.

pyenv install 3.9.16 
pyenv global 3.9.16
pip install langchain
Enter fullscreen mode Exit fullscreen mode

Document loaders

Document loaders provider an easy way to load data. As of today, there is a total of 115 document loaders. Loading from any supported sources are now reduced to just 2 lines of code: 1) initialize Loader object and 2) calling the .load() function.

Even reading a Youtube transcript is simplified down to:

from langchain.document_loaders import YoutubeLoader
loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=QsYGlZkevEg", add_video_info=True
)

loader.load()
Enter fullscreen mode Exit fullscreen mode

Okay, I lied. You need 3 lines of code to load data from the supported sources. Remember to import the loader of your choice from langchain.document_loaders.

For the full list of support sources, please see here.

Chains and Document transformers

Think of chains as a means to glue together multiple Langchain primitive components together. Of course, there are already a few "pre-glued" chains such as Summarization, Retrieval QA, SQL, and etc.

Now, let's see how we can use a Summarization chain to summarize a recent Youtube video of ex-Senior Minister Mr Tharman's full team speeches for his intent to run for President of Singapore.

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import YoutubeLoader

llm = OpenAI(temperature=0, openai_api_key="<YOUR-OPENAI-API-KEY>")
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=7cZk5QPbmZ8", add_video_info=True)
docs = loader.load()

chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
Enter fullscreen mode Exit fullscreen mode

Image description

Unfortunately, our simple example to summarize the Youtube video didn't work. And the reason, is because gpt3.5 model currently accepts only max 4096 tokens and we have 7491 tokens (7235 in your prompt; 256 for the completion)!

This is where Langchain's Document Transformers shine. Let's try again and use a RecursiveCharacterTextSplitter to split our text.

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


llm = OpenAI(temperature=0, openai_api_key="<YOUR-OPENAI-API-KEY>")
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=7cZk5QPbmZ8", add_video_info=True)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=600, length_function=len, add_start_index=True)
texts = text_splitter.create_documents([docs[0].page_content])

chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(texts)
Enter fullscreen mode Exit fullscreen mode

🥳 And we get:

Mr Tarman is a potential candidate for the Singaporean presidency who is known for his independence of thought, sharp intellect, and commitment to social justice. He is seen as a unifying figure who can bring people together and is an advocate for equality of resources in the international community. He has been a mentor to the speaker since 2005 and has helped many people escape poverty and discrimination. He is a respected diplomat and is socially savvy, often invited to events and offering other senior public office holders to grace such causes. He is the ideal candidate for the upcoming presidential election.

Wrap up

I've barely scratch the surface on what's possible with Langchain but you have seen how with merely 11 lines, we manage to:

  1. Download a Youtube video transcript.
  2. Split the transcript into overlapping chunks.
  3. For each of the chunk, send a prompt together with the chunk itself to OpenAI API to get a summary.
  4. Finally, send a prompt to OpenAI API to get a summary of the summaries in step 3.

Now, I hope you understand why Langchain is so popular now. If not, try implementing the above without Langchain and see how much more time, effort, and hair pulled is required.

Resources:

💖 💪 🙅 🚩
hiimivantang
hiimivantang

Posted on July 26, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related