How to Upload Images to Google Gemini for Next.js

Introduction

Google Gemini exhibits strong performance in multi-model tasks, particularly the latest Gemini 1.5 Flash and Gemini 1.5 Pro. There are two benchmarks for multi-model tasks: reasoning and math. As demonstrated, the Gemini 1.5 Pro performs on par with the latest GPT-4o in visual math tasks 🎉.

Benchmark	Description	Gemini 1.5 Flash	Gemini 1.5 Pro	GPT-4o
MMMU	Multi-discipline college-level reasoning problems	56.1%	62.2%	69.1%
MathVista	Mathematical reasoning in visual contexts	58.4%	63.9%	63.8%

In this blog, I will guide you on how to unlock the vision capabilities of Google Gemini. Let's get started 🚀.

Prerequisite

In my latest blog, I demonstrated how to use Google Gemini with Next.js for streaming output. While the previous guide focused on text input, this article will show you how to upload images to Google Gemini, using a simple demo. If you're unfamiliar with registering a Google AI API Key or using the Vercel AI SDK, I recommend reading the previous blog first.

Server-Side

Here is the complete server-side function. I made a few modifications, namely removing the custom Message and importing CoreMessage instead.

"use server";

import { google } from "@ai-sdk/google";
import { CoreMessage, LanguageModel, streamText } from "ai";
import { createStreamableValue } from "ai/rsc";

export async function continueConversation(history: CoreMessage[]) {
  "use server";

  const stream = createStreamableValue();
  const model = google.chat("models/gemini-1.5-pro-latest");

  (async () => {
    const { textStream } = await streamText({
      model: model,
      messages: history,
    });

    for await (const text of textStream) {
      stream.update(text);
    }

    stream.done();
  })().then(() => {});

  return {
    messages: history,
    newMessage: stream.value,
  };
}

The CoreMessage is a complex structure that can accept various types of data. CoreUserMessage is a message sent by a user, it has a fixed role user and flexible content. The UserContent can either be a plain string, a TextPart object, or an ImagePart object.

type CoreUserMessage = {
    role: 'user';
    content: UserContent;
};

type UserContent = string | Array<TextPart$1 | ImagePart>;

interface TextPart$1 {
    type: 'text';
    text: string;
}

interface ImagePart {
    type: 'image';
    /**
  Image data. Can either be:

  - data: a base64-encoded string, a Uint8Array, an ArrayBuffer, or a Buffer
  - URL: a URL that points to the image
     */
    image: DataContent | URL;
    /**
  Optional mime type of the image.
     */
    mimeType?: string;
}

Delve deep into the ImagePart. You can pass either base64-encoded image data or an image URL into the image field. In this instance, to simplify the system, we will pass base64-encoded image data into the message.

Client-Side

This page requires key modifications. We need to upload an image, encode it into a base64 message, and preview the image within the message. The following are the complete codes for the page after the update. You can copy and paste this code, and I'll explain the key points afterward.

"use client";

import { useState } from "react";
import { continueConversation } from "./actions";
import { readStreamableValue } from "ai/rsc";
import { CoreMessage } from "ai";

export default function Home() {
  const [conversation, setConversation] = useState<CoreMessage[]>([]);
  const [imageInput, setImageInput] = useState<string>("");
  const [textInput, setTextInput] = useState<string>("");

  async function getBase64(file: File): Promise<string> {
    return new Promise((resolve) => {
      const reader = new FileReader();
      reader.readAsDataURL(file);
      reader.onload = () => {
        resolve(reader.result as string);
      };
    });
  }

  return (
    <div>
      <div>
        {conversation.map((message, index) => (
          <div key={index}>
            {message.role}:{" "}
            {
              // if it's string, just show it, else if it is image, preview image, if it is text, show the text
              typeof message.content === "string" ? (
                message.content
              ) : message.content[0].type === "image" ? (
                <img
                  alt=""
                  src={
                    ("data:image;base64," + message.content[0].image) as string
                  }
                  width={640}
                />
              ) : message.content[0].type === "text" ? (
                message.content[0].text
              ) : (
                ""
              )
            }
          </div>
        ))}
      </div>

      <div>
        <input
          type="file"
          onChange={(event) => {
            if (event.target.files) {
              const file = event.target.files[0];
              getBase64(file).then((result) => {
                setImageInput(result);
              });
            } else {
              setImageInput("");
            }
          }}
        />
        <input
          type="text"
          value={textInput}
          onChange={(event) => {
            setTextInput(event.target.value);
          }}
        />
        <button
          onClick={async () => {
            // append user messages
            const userMessages: CoreMessage[] = [];
            if (imageInput.length) {
              // remove data:*/*;base64 from result
              const pureBase64 = imageInput
                .toString()
                .replace(/^data:image\/\w+;base64,/, "");
              userMessages.push({
                role: "user",
                content: [{ type: "image", image: pureBase64 }],
              });
            }
            if (textInput.length) {
              userMessages.push({
                role: "user",
                content: [{ type: "text", text: textInput }],
              });
            }
            const { messages, newMessage } = await continueConversation([
              ...conversation,
              ...userMessages,
            ]);

            // collect assistant message
            let textContent = "";
            for await (const delta of readStreamableValue(newMessage)) {
              textContent = `${textContent}${delta}`;

              setConversation([
                ...messages,
                {
                  role: "assistant",
                  content: [{ type: "text", text: textContent }],
                },
              ]);
            }
          }}
        >
          Send Message
        </button>
      </div>
    </div>
  );
}

Due to the complexity of CoreMessage, I have added some conditional branches to handle message previews. This is particularly the case when using the <img /> tag to display base64-encoded images.
Add another <input> with type="file" to upload an image. When a change occurs, read the image file and convert it into a base64 string.
Finally, when the send button is clicked, we need to convert the image and text inputs into an array of CoreMessage. Please note that the base64 header should be discarded from the image input.

Body Size Config

The default bodySizeLimit for Next.js is set to 1MB. If you wish to upload files larger than 1MB, you need to adjust the configuration as follows.

const nextConfig = {
    experimental: {
        serverActions: {
            bodySizeLimit: '10mb'
        }
    }
};

Let’s Test Now

I upload the cover image from the previous blog and ask, "What is this picture about?" Then, I click the send button.

Examine the assistant's output; it's quite impressive 👏👏👏.

References

Documentation for the AI SDK: https://sdk.vercel.ai/docs/introduction
Google AI Studio: https://ai.google.dev/aistudio

Conclusion

In this post, I've explored the key features and benefits of Google Gemini in front-end.

If you're interested in seeing Google Gemini in action, check out these products that have successfully implemented it:

AI Math Solver - A webapp that help users to solve math problems. Learn more: AIMathSolver

Have you used Google Gemini in your projects? Share your experiences in the comments below!

Blog