Building a Web Page Summarization App with Next.js, OpenAI, LangChain, and Supabase
Nasser Maronie
Posted on June 23, 2024
An app that can understand the context of any web page.
In this article, we'll show you how to create a handy web app that can summarize the content of any web page. Using Next.js for a smooth and fast web experience, LangChain for processing language, OpenAI for generating summaries, and Supabase for managing and storing vector data, we'll build a powerful tool together.
Why We're Building It
We all face information overload with so much content online. By making an app that gives quick summaries, we help people save time and stay informed. Whether you're a busy worker, a student, or just someone who wants to keep up with news and articles, this app will be a helpful tool for you.
How it's going to be
Our app will let users enter any website URL and quickly get a brief summary of the page. This means you can understand the main points of long articles, blog posts, or research papers without reading them fully.
Potential and Impact
This summarization app can be useful in many ways. It can help researchers skim through academic papers, keep news lovers updated, and more. Plus, developers can build on this app to create even more useful features.
Next.js
Next.js is a powerful and flexible React framework developed by Vercel that enables developers to build server-side rendering (SSR) and static web applications with ease. It combines the best features of React with additional capabilities to create optimized and scalable web applications.
OpenAI
The OpenAI module in Node.js provides a way to interact with OpenAI’s API, allowing developers to leverage powerful language models like GPT-3 and GPT-4. This module enables you to integrate advanced AI functionalities into your Node.js applications.
LangChain.js
LangChain is a powerful framework designed for developing applications with language models. Originally developed for Python, it has since been adapted for other languages, including Node.js. Here’s an overview of LangChain in the context of Node.js:
What is LangChain?
LangChain is a library that simplifies the creation of applications using large language models (LLMs). It provides tools to manage and integrate LLMs into your applications, handle chaining of calls to these models, and enable complex workflows with ease.
How Large Language Models (LLM) Work?
Large Language Models (LLMs) like OpenAI’s GPT-3.5 are trained on vast amounts of text data to understand and generate human-like text. They can generate responses, translate languages, and perform many other natural language processing tasks.
Supabase
Supabase is an open-source backend-as-a-service (BaaS) platform designed to help developers quickly build and deploy scalable applications. It offers a suite of tools and services that simplify database management, authentication, storage, and real-time capabilities, all built on top of PostgreSQL
Prerequisites
Before we start, make sure you have the following:
- Node.js and npm installed
- A Supabase account
- An OpenAI account
Step 1: Setting Up Supabase
First, we need to set up a Supabase project and create the necessary tables to store our data.
Create a Supabase Project
- Go to Supabase and sign up for an account.
- Create a new project and make note of your Supabase URL and API key. You'll need these later.
SQL Script for Supabase
Create a new SQL query in your Supabase dashboard and run the following scripts to create the required tables and functions:
First, create an extension if it doesn’t already exist for our vector store:
create extension if not exists vector;
Next, create a table named “documents”. This table will be used to store and embed the content of web page in vector format:
create table if not exists documents (
id bigint primary key generated always as identity,
content text,
metadata jsonb,
embedding vector(1536)
);
Now, we need a function to query our embedded data:
create or replace function match_documents (
query_embedding vector(1536),
match_count int default null,
filter jsonb default '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
Next, we need to set up our table for storing the web page's detail:
create table if not exists files (
id bigint primary key generated always as identity,
url text not null,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
Step 2: Setting Up OpenAI
Create OpenAI Project
- Visit the OpenAI Website: Go to OpenAI's website, sign up and create new project.
- Navigate to API: After logging in, navigate to the API section and create new API key. This is usually accessible from the dashboard.
Step 3: Setting Up Next.js
Create Next.js app
$ npx create-next-app summarize-page
$ cd ./summarize-page
Install the required dependencies:
npm install @langchain/community @langchain/core @langchain/openai @supabase/supabase-js langchain openai axios
Then we will install Material UI for building our interface, feel free to use other library:
npm install @mui/material @emotion/react @emotion/styled
Step 4: OpenAI and Supabase clients
Next, we need to set up the OpenAI and Supabase clients. Create a libs
directory in your project and add the following files.
src/libs/openAI.ts
This file will configure the OpenAI client.
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
const openAIApiKey = process.env.OPENAI_API_KEY;
if (!openAIApiKey) throw new Error('OpenAI API Key not found.')
export const llm = new ChatOpenAI({
openAIApiKey,
modelName: "gpt-3.5-turbo",
temperature: 0.9,
});
export const embeddings = new OpenAIEmbeddings(
{
openAIApiKey,
},
{ maxRetries: 0 }
);
-
llm
: The language model instance, which will generate our summaries. -
embeddings
: This will create embeddings for our documents, which help in finding similar content.
src/libs/supabaseClient.ts
This file will configure the Supabase client.
import { createClient } from "@supabase/supabase-js";
const supabaseUrl = process.env.SUPABASE_URL || "";
const supabaseAnonKey = process.env.SUPABASE_ANON_KEY || "";
if (!supabaseUrl) throw new Error("Supabase URL not found.");
if (!supabaseAnonKey) throw new Error("Supabase Anon key not found.");
export const supabaseClient = createClient(supabaseUrl, supabaseAnonKey);
-
supabaseClient
: The Supabase client instance to interact with our Supabase database.
Step 5: Creating Services for Content and Files
Create a services
directory and add the following files to handle fetching content and managing files.
src/services/content.ts
This service will fetch the web page content and clean it by removing HTML tags, scripts, and styles.
import axios from "axios";
export async function getContent(url: string): Promise<string> {
let htmlContent: string = "";
const response = await axios.get(url as string);
htmlContent = response.data;
if (!htmlContent) return "";
// Remove unwanted elements and tags
return htmlContent
.replace(/style="[^"]*"/gi, "")
.replace(/<style[^>]*>[\s\S]*?<\/style>/gi, "")
.replace(/\s*on\w+="[^"]*"/gi, "")
.replace(
/<script(?![^>]*application\/ld\+json)[^>]*>[\s\S]*?<\/script>/gi,
""
)
.replace(/<[^>]*>/g, "")
.replace(/\s+/g, " ");
}
This function fetches the HTML content of a given URL and cleans it up by removing styles, scripts, and HTML tags.
src/services/file.ts
This service will save the web page content into Supabase and retrieve summaries.
import { embeddings, llm } from "@/libs/openAI";
import { supabaseClient } from "@/libs/supabaseClient";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
import { StringOutputParser } from "@langchain/core/output_parsers";
import {
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
} from "@langchain/core/prompts";
import {
RunnablePassthrough,
RunnableSequence,
} from "@langchain/core/runnables";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { formatDocumentsAsString } from "langchain/util/document";
export interface IFile {
id?: number | undefined;
url: string;
created_at?: Date | undefined;
}
export async function saveFile(url: string, content: string): Promise<IFile> {
const doc = await supabaseClient
.from("files")
.select()
.eq("url", url)
.single<IFile>();
if (!doc.error && doc.data?.id) return doc.data;
const { data, error } = await supabaseClient
.from("files")
.insert({ url })
.select()
.single<IFile>();
if (error) throw error;
const splitter = new RecursiveCharacterTextSplitter({
separators: ["\n\n", "\n", " ", ""],
});
const output = await splitter.createDocuments([content]);
const docs = output.map((d) => ({
...d,
metadata: { ...d.metadata, file_id: data.id },
}));
await SupabaseVectorStore.fromDocuments(docs, embeddings, {
client: supabaseClient,
tableName: "documents",
queryName: "match_documents",
});
return data;
}
export async function getSummarization(fileId: number): Promise<string> {
const vectorStore = await SupabaseVectorStore.fromExistingIndex(embeddings, {
client: supabaseClient,
tableName: "documents",
queryName: "match_documents",
});
const retriever = vectorStore.asRetriever({
filter: (rpc) => rpc.filter("metadata->>file_id", "eq", fileId),
k: 2,
});
const SYSTEM_TEMPLATE = `Use the following pieces of context, explain what is it about and summarize it.
If you can't explain it, just say that you don't know, don't try to make up some explanation.
----------------
{context}`;
const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{format_answer}"),
];
const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
format_answer: new RunnablePassthrough(),
},
prompt,
llm,
new StringOutputParser(),
]);
const format_summarization =
`
Give it title, subject, description, and the conclusion of the context in this format, replace the brackets with the actual content:
[Write the title here]
By: [Name of the author or owner or user or publisher or writer or reporter if possible, otherwise leave it "Not Specified"]
[Write the subject, it could be a long text, at least minimum of 300 characters]
----------------
[Write the description in here, it could be a long text, at least minimum of 1000 characters]
Conclusion:
[Write the conclusion in here, it could be a long text, at least minimum of 500 characters]
`;
const summarization = await chain.invoke(format_summarization);
return summarization;
}
-
saveFile
: Saves the file and its content to Supabase, splits the content into manageable chunks, and stores them in the vector store. -
getSummarization
: Retrieves relevant documents from the vector store and generates a summary using OpenAI.
Step 6: Creating an API Handler
Now, let's create an API handler to process the content and generate a summary.
pages/api/content.ts
import { getContent } from "@/services/content";
import { getSummarization, saveFile } from "@/services/file";
import { NextApiRequest, NextApiResponse } from "next";
export default async function handler(
req: NextApiRequest,
res: NextApiResponse
) {
if (req.method !== "POST")
return res.status(404).json({ message: "Not found" });
const { body } = req;
try {
const content = await getContent(body.url);
const file = await saveFile(body.url, content);
const result = await getSummarization(file.id as number);
res.status(200).json({ result });
} catch (err) {
res.status(
500).json({ error: err });
}
}
This API handler receives a URL, fetches the content, saves it to Supabase, and generates a summary. It handles both the saveFile
and getSummarization
functions from our services.
Step 7: Building the Frontend
Finally, let's create the frontend in src/pages/index.tsx
to allow users to input URLs and display the summarizations.
src/pages/index.tsx
import axios from "axios";
import { useState } from "react";
import {
Alert,
Box,
Button,
Container,
LinearProgress,
Stack,
TextField,
Typography,
} from "@mui/material";
export default function Home() {
const [loading, setLoading] = useState(false);
const [url, setUrl] = useState("");
const [result, setResult] = useState("");
const [error, setError] = useState<any>(null);
const onSubmit = async () => {
try {
setError(null);
setLoading(true);
const res = await axios.post("/api/content", { url });
setResult(res.data.result);
} catch (err) {
console.error("Failed to fetch content", err);
setError(err as any);
} finally {
setLoading(false);
}
};
return (
<Box sx={{ height: "100vh", overflowY: "auto" }}>
<Container
sx={{
backgroundColor: (theme) => theme.palette.background.default,
position: "sticky",
top: 0,
zIndex: 2,
py: 2,
}}
>
<Typography sx={{ mb: 2, fontSize: "24px" }}>
Summarize the content of any page
</Typography>
<TextField
fullWidth
label="Input page's URL"
value={url}
onChange={(e) => {
if (result) setResult("");
setUrl(e.target.value);
}}
sx={{ mb: 2 }}
/>
<Button
disabled={loading}
variant="contained"
onClick={onSubmit}
>
Summarize
</Button>
</Container>
<Container maxWidth="lg" sx={{ py: 2 }}>
{loading ? (
<LinearProgress />
) : (
<Stack sx={{ gap: 2 }}>
{result && (
<Alert>
<Typography
sx={{
whiteSpace: "pre-line",
wordBreak: "break-word",
}}
>
{result}
</Typography>
</Alert>
)}
{error && <Alert severity="error">{error.message || error}</Alert>}
</Stack>
)}
</Container>
</Box>
);
}
This React component allows users to input a URL, submit it, and display the generated summary. It handles loading states and error messages to provide a better user experience.
Step 8: Running the Application
Create a .env file in the root of your project to store your environment variables:
SUPABASE_URL=your-supabase-url
SUPABASE_ANON_KEY=your-supabase-anon-key
OPENAI_API_KEY=your-openai-api-key
Finally, start your Next.js application:
npm run dev
Now, you should have a running application where you can input web page's url, and receive the page's summarized responses.
Conclusion
Congratulations! You've built a fully functional web page summarization application using Next.js, OpenAI, LangChain, and Supabase. Users can input a URL, fetch the content, store it in Supabase, and generate a summary using OpenAI's capabilities. This setup provides a robust foundation for further enhancements and customization based on your needs.
Feel free to expand on this project by adding more features, improving the UI, or integrating additional APIs.
Check the source code in this repo:
https://github.com/firstpersoncode/summarize-page
Happy coding!
Posted on June 23, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.