How to Use Cloudinary AI to Write Better Image Captions
oteri
Posted on October 21, 2024
Have you always found it challenging to add captions to your images on social media platforms like X and LinkedIn for accessibility using alt text?
Caption Image is an app that automatically solves this problem by analyzing your image and its details using Cloudinary AI to provide a perfect description automatically.
This guide will cover connecting the server code (API) to the client side to build a robust full-stack application for image captions.
Want to give it a try? Check out the Caption Image app here.
Before you begin
Prerequisites
Basic understanding of React
Node.js installed on your local machine
Set up a Cloudinary account
Creating the server
Run this command to create your project as follows:
mkdir caption-image-server
cd caption-image-server
npm init -y // initialize the folder
After this setup, install the following dependencies to be able to build the API:
npm install nodemon --save-dev
Nodemon: Runs your development server and monitors changes for any change in the code
npm install cors cloudinary dotenv express
cors: it allows you to perform cross-domain requests in web applications
cloudinary: cloud storage for image and video
dotenv: load environment variables from a
.env
fileexpress: a node.js framework for building APIs
In the package.json
, update the script objects with the following:
{
...
"scripts": {
"start": "node index",
"start:dev": "nodemon index"
},
...
}
The index
represents the file used to create the backend code. Run this code to create the file:
touch index.js
Environment variables
The environment variables keep our credentials secret and prevent them from being leaked when pushed to GitHub.
.env
CLOUDINARY_CLOUD_NAME=your_cloud_name
CLOUDINARY_API_KEY=your_api_key
CLOUDINARY_API_SECRET=your_api_secret
Go to your Cloudinary dashboard, and you will have access to your values. Replace the placeholder text after the equal sign.
Let's create the server. Copy-paste this code into your index.js
file:
import express from "express";
import { v2 as cloudinary } from "cloudinary";
import * as dotenv from "dotenv";
import cors from "cors";
dotenv.config();
const app = express();
app.use(cors());
app.use(express.json());
cloudinary.config({
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
app.get("/", (req, res) => {
res.status(200).json({
message: "Upload and generate image caption with Cloudinary AI",
});
});
app.post("/api/v1/caption", async (req, res) => {
try {
const { imageUrl } = req.body;
if (!imageUrl) {
return res.status(400).json({
success: false,
message: "Image URL is required",
});
}
const result = await cloudinary.uploader.upload(imageUrl, {
detection: "captioning",
});
res.status(200).json({
success: true,
caption: result.info.detection.captioning.data.caption,
});
} catch (error) {
res.status(500).json({
success: false,
message: "Unable to generate image caption",
error: error.message,
});
}
});
const startServer = async () => {
try {
app.listen(8080, () => console.log("Server started on port 8080"));
} catch (error) {
console.log("Error starting server:", error);
}
};
startServer();
The code snippet shows the endpoints to the GET and POST HTTP methods. The POST method reads the image and crafts a caption. Check out Cloudinary AI Content Analysis to learn more about the practical use case of this technology.
Start the development server
In your terminal, use the command to run the server at http://localhost:8080
.
npm run start:dev
Creating the UI
Next.js is a popular framework among frontend developers because it helps create beautiful and user-friendly interfaces with reusable components.
Installation
As with any project, we need to create the boilerplate code that includes the files and folders with this command:
npx create-next-app@latest caption-image-client
During installation, some prompts will appear, allowing you to choose your preferences for the project.
Next, install these dependencies:
npm install react-toastify next-cloudinary cloudinary copy-to-clipboard
react-toastify: for notification
next-cloudinary: The Cloudinary package is developed for high-performance image and video delivery
copy-to-clipboard: copy text to the clipboard
Environment variables
In the same way, as with the backend code, we need to create the environment variables in the root directory with the following:
.env
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=your_cloud_name
NEXT_PUBLIC_CLOUDINARY_API_KEY=your_api_key
CLOUDINARY_API_SECRET=your_api_secret
These variables will help sign our requests because we will use Cloudinary signed uploads to send files to the cloud. The signed uploads add an extra security layer to file uploads instead of unsigned uploads.
Configuring Cloudinary
Create a lib
folder in the root directory, and it, a file name cloudinary.js
lib/cloudinary.js
import { v2 as cloudinary } from "cloudinary";
cloudinary.config({
cloud_name: process.env.NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME,
api_key: process.env.NEXT_PUBLIC_CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
export default cloudinary;
Next, in the App router, create a new API route with this file name, api/sign-cloudinary-params/route.js
:
app/api/sign-cloudinary-params/route.js
import cloudinary from "@/lib/cloudinary";
export async function POST(request) {
const body = await request.json();
const { paramsToSign } = body;
const signature = cloudinary.utils.api_sign_request(
paramsToSign,
process.env.CLOUDINARY_API_SECRET
);
return Response.json({ signature });
}
Displaying the UI content
Here, the home route will display the content users can interact with within the app.
app/page.js
"use client";
import { CldUploadWidget } from "next-cloudinary";
import { useState } from "react";
import copy from "copy-to-clipboard";
import { ToastContainer, toast } from "react-toastify";
import "react-toastify/dist/ReactToastify.css";
export default function Home() {
const [imageUrl, setImageUrl] = useState("");
const [caption, setCaption] = useState("");
const [loading, setLoading] = useState(false);
const handleUploadSuccess = async (result) => {
setLoading(true);
setImageUrl(result.info.secure_url);
const response = await fetch(
"http://localhost:8080/api/v1/caption",
{
method: "post",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ imageUrl: result.info.secure_url }),
}
);
const data = await response.json();
setCaption(data.caption);
setLoading(false);
};
const handleCopyCaption = () => {
copy(caption, {
debug: true,
});
toast("✅ Copied");
};
return (
<main className='flex min-h-screen flex-col items-center justify-center p-4 bg-gray-100'>
<h1 className='text-2xl font-bold mb-8'>AI-Enhanced Photo Captioner</h1>
<div className='w-full max-w-4xl grid grid-cols-1 md:grid-cols-2 gap-4 mb-8'>
{imageUrl && (
<div className='flex flex-col items-center justify-center border border-dashed border-gray-400 p-4 bg-white rounded-lg'>
<h2 className='text-xl font-semibold mb-2'>Uploaded Image:</h2>
<img
src={imageUrl}
alt='Uploaded'
className='w-full h-auto object-contain'
style={{ maxHeight: "60vh" }}
/>
</div>
)}
{caption && (
<div className='flex flex-col items-center justify-center border border-dashed border-gray-400 p-4 bg-white rounded-lg'>
<h2 className='text-xl font-semibold mb-2'>Caption:</h2>
<p className='select-none'>{caption}</p>
<button
className='px-4 py-2 bg-emerald-700 text-white font-semibold rounded-lg shadow-md hover:bg-emerald-600 focus:outline-none mt-5 cursor-pointer'
onClick={handleCopyCaption}>
Copy caption
</button>
</div>
)}
</div>
<div className='flex flex-col items-center'>
<CldUploadWidget
signatureEndpoint='/api/sign-cloudinary-params'
onSuccess={handleUploadSuccess}>
{({ open }) => (
<button
onClick={() => open()}
disabled={loading}
className='px-4 py-2 bg-blue-600 text-white font-semibold rounded-lg shadow-md hover:bg-blue-700 focus:outline-none disabled:bg-gray-400'>
{loading ? "Loading..." : "Upload an Image"}
</button>
)}
</CldUploadWidget>
</div>
<ToastContainer theme='dark' />
</main>
);
}
Now that we have the code for the home page clicking the "Upload an Image" button opens the Cloudinary widget interface that offers many options for uploading an image. Once you have selected an image, it processes the data with Cloudinary, generating both the picture and the caption side-by-side. Then, a notification pops up when you "Copy caption" to the clipboard for later use as an alternative text for your image.
Tech stack
These are the following technologies that made it possible to build the AI-enhanced photo captioner:
Next.js
Cloudinary
Vercel
Express
Important links
Caption Image: https://caption-image-gamma.vercel.app/
Server code: https://github.com/Terieyenike/caption-image-server
GitHub repo: https://github.com/Terieyenike/caption-image-client
API: https://caption-image-server.onrender.com/
Deployment
These two technologies managed the deployment of the app on the web.
Vercel: helps deploy frontend web applications
Render: hosting the server code (API) in the cloud
Conclusion
Everything is made possible by using AI. It shows how efficiently AI is used to our advantage in creating for humans.
The AI-enhanced photo captioner is one example of the power of Cloudinary APIs and tools for building your next app. It removes the need to use other tools that provide similar services when bundling it all in Cloudinary.
Happy coding!
Posted on October 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.