Purpose

In chat services powered by generative AI like OpenAI's ChatGPT and Anthropic's Claude, a UI that gradually displays text by receiving data streamed from the generative AI model is adopted. With a typical request/response format, you need to wait until the AI processing is completely finished, which can cause the client side to display a loading screen for tens of seconds depending on the processing content. Therefore, receiving and displaying data little by little through streaming is good for UX. As services using AI increase in the future, such use cases may increase. From a development perspective, while there are official examples for such implementations on the server side, I found few implementation examples for receiving and displaying streaming data on the client side. Therefore, this time I considered an implementation example using trpc + React (Next.js) to support OpenAI's streaming responses which I personally expect.

Premise

trpc

Move Fast and Break Nothing.
End-to-end typesafe APIs made easy.
Experience the full power of TypeScript inference to boost productivity
for your full-stack application.

https://trpc.io/

In short, trpc is an RPC framework that allows you to reuse the request and response type definitions written on the server side with TypeScript on the client side.

For example, in REST API, there is no schema definition for requests and responses, so even if the request content from the client side is wrong or the response content from the server side is wrong, the process continues and unexpected errors may occur. While it is possible to define schemas by introducing tools like OpenAPI, you would need to introduce additional tools to check whether the actual requests and responses match those schema definitions.

In GraphQL or gRPC, schema definitions are built into the specification, but since schema definitions are done in their own languages, there is a learning cost and effort required for that, and since you have to manage both the actual code and the schema, you need to consider both when making changes.
(Of course, each has its own advantages, so they cannot be simply evaluated.)

On the other hand, in trpc, the TypeScript type definitions written on the server side are directly reflected and reusable on the client side, so you only need to know TypeScript, and there is no need to define separate schemas, making it easy to change and develop with very high productivity. Although there is a premise that the server side must also be implemented in TypeScript, given that TypeScript is so widespread that it's almost indispensable in recent frontend development, I personally think it is a good choice to write the backend in TypeScript to match the frontend.

Additionally, while it is often thought that libraries related to generative AI are only in Python, there are actually official supports for TypeScript in generative AI libraries like openai-node, anthropic-sdk-typescript, and langchainjs, which is another advantage of adopting TypeScript on the backend.

Deliverable

An app was created where the response text is gradually displayed like ChatGPT after entering a prompt and clicking a button!

Implementation

The app created this time can be checked on GitHub.
https://github.com/mikan3rd/trpc-openai-stream-sample

Create a base for the trpc app

This time, we will create a base using Next.js starter with Prisma.



pnpx create-next-app --example https://github.com/trpc/trpc --example-path examples/next-prisma-starter trpc-prisma-starter

ref: https://trpc.io/docs/example-apps

Enable stream with openai-node on the server side

There is an official library to handle OpenAI's API with TypeScript, so let's add it.
https://github.com/openai/openai-node

To receive results by passing a prompt like ChatGPT, use openai.chat.completions.create(). To enable streaming, just add stream: true as an argument.



import OpenAI from 'openai';

// https://platform.openai.com/docs/api-reference/streaming
const openai = new OpenAI({
  apiKey: process.env.OPEN_AI_API_KEY,
});

const stream = await openai.chat.completions.create({
  model: 'gpt-3.5-turbo',
  messages: [
    {
      role: 'user',
      content: text,
    },
  ],
  stream: true,
});

Please note that in order to use the API, you need to issue an API KEY and set up payment in advance.
ref: https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key

Define an asynchronous generator function on the server side to control the stream

In trpc, you can implement streaming responses by defining an asynchronous generator function with async function* and returning an AsyncGenerator object. If you want to save the final data to a DB or similar, you can refer to the fullContent part.



let fullContent = '';
for await (const chunk of stream) {
  const targetIndex = 0;
  const target = chunk.choices[targetIndex];
  const content = target?.delta?.content ?? '';
  yield content;

  fullContent += content;
}

console.log({ fullContent });

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/431de4780d0c3f8f7494d8265f71cd686c0e55f0/src/server/functions/openai.ts

In the router, just call the above asynchronous generator function with yield*.



openai: publicProcedure
  .input(
    z.object({
      text: z.string().min(1),
    }),
  )
  .query(async function* ({ input }) {
    yield* chatLangChain({ modelType: 'openai', text: input.text 
  });

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/707892551c509863dc7d5006b219a7696b729401/src/server/routers/_app.ts

ref: https://trpc.io/docs/client/links/httpBatchStreamLink#generators

For those who might not be familiar with generators, here is a reference article.
(I had only used it in redux-saga.)
https://zenn.dev/qnighy/articles/112af47edfda96

Replace with httpBatchStreamLink on the client side

To receive the stream on the client side, use unstable_httpBatchStreamLink in the links argument of createTRPCNext.



import { unstable_httpBatchStreamLink } from '@trpc/client';

unstable_httpBatchStreamLink({
  url: `${getBaseUrl()}/api/trpc`,
})

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/431de4780d0c3f8f7494d8265f71cd686c0e55f0/src/utils/trpc.ts

ref: https://trpc.io/docs/client/links/httpBatchStreamLink#streaming-mode

Though it is marked unstable_, it is safe to use in production, indicating only that there is a possibility of change.
(experimental_ would indicate insufficient testing.)

https://trpc.io/docs/faq#unstable

Display data on the client side

As with usual data fetching methods, using useQuery(), data is reflected in the data of the return value as the API is called, so just render it.



const [inputText, setInputText] = useState<string>(
  'ChatGPT、Claude、LangChainについて簡潔に説明してください',
);

const openai = trpc.examples.openai.useQuery(
  { text: inputText },
  {
    enabled: false,
    placeholderData: keepPreviousData,
  },
);
const submitByQuery = async () => {
  await openai.refetch();
};

return (
  <p className="py-4 break-all whitespace-pre-wrap">
    {openai.data?.map((chunk, index) => (
      <Fragment key={index}>{chunk}</Fragment>
    ))}
  </p>
);

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/431de4780d0c3f8f7494d8265f71cd686c0e55f0/src/pages/index.tsx

ref: https://trpc.io/docs/client/react/useQuery#streaming

By the way, @trpc/react-query wraps @tanstack/react-query, and if you want to fetch data at an arbitrary timing, set enabled: false and call refetch().

Additionally, if variables (in this example, the { text: inputText } part) are changed, data is reset, so if you want to retain data until refetching, specify placeholderData: keepPreviousData.

That's all!

With the trpc setup done, it was possible to implement a ChatGPT-like UI with simple and minimal code.

Also, if you want to save data, you would use a mutation instead of a query. I tried implementing that as well, but data remained an AsyncGenerator object, so it needed to be handled with for await as shown below.



const [text, setText] = useState<string>('');
const openai2 = trpc.examples.openai2.useMutation();
const submitByMutation = async () => {
  openai2.mutate(
    { text: inputText },
    {
      onSuccess: async (data) => {
        setText('');
        for await (const val of data) {
          setText((prev) => prev + val);
        }
      },
    },
  );
};

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/431de4780d0c3f8f7494d8265f71cd686c0e55f0/src/pages/index.tsx

(Currently discussing if this can be fixed in the following issue)
https://github.com/trpc/trpc/issues/5846

Bonus

Since it was an opportunity, I also implemented the same with Claude and LangChain.

Claude

I also supported Anthropic's Claude, which has been well-reviewed recently. Although the API specifications are slightly different, the implementation was almost similar.

https://github.com/anthropics/anthropic-sdk-typescript



import Anthropic from '@anthropic-ai/sdk';

// https://docs.anthropic.com/en/api/messages-streaming
const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export const messageCreateStream = async function* (text: string) {
  const stream = await anthropic.messages.create({
    max_tokens: 1024,
    messages: [{ role: 'user', content: text }],
    model: 'claude-3-haiku-20240307',
    stream: true,
  });

  let fullContent = '';
  for await (const messageStreamEvent of stream) {
    switch (messageStreamEvent.type) {
      case 'content_block_delta':
        switch (messageStreamEvent.delta.type) {
          case 'text_delta':
            const text = messageStreamEvent.delta.text;
            yield text;
            fullContent += text;
            break;

          default:
            break;
        }
        break;

      default:
        break;
    }
  }

  console.log({ fullContent });
};

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/431de4780d0c3f8f7494d8265f71cd686c0e55f0/src/server/functions/anthropicAI.ts

In this case as well, issuing an API KEY and setting up payment in advance is necessary.

LangChain

By using LangChain, it was possible to call LLMs from OpenAI and Anthropic with more common code. It seems more convenient to use LangChain if you want to switch between or compare multiple LLMs.

https://github.com/langchain-ai/langchainjs



import { ChatOpenAI } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';

// https://js.langchain.com/v0.2/docs/how_to/streaming/
const chatOpenAI = new ChatOpenAI({
  apiKey: process.env.OPEN_AI_API_KEY,
  model: 'gpt-3.5-turbo',
});

const chatAnthropic = new ChatAnthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  model: 'claude-3-haiku-20240307',
});

export const chatLangChain = async function* (args: {
  modelType: 'openai' | 'anthropic';
  text: string;
}) {
  const { modelType, text } = args;

  const model = (() => {
    switch (modelType) {
      case 'openai':
        return chatOpenAI;
      case 'anthropic':
        return chatAnthropic;
      default:
        // eslint-disable-next-line @typescript-eslint/restrict-template-expressions
        throw new Error(`Unknown modelType: ${modelType}`);
    }
  })();

  const stream = await model.stream([['user', text]]);

  let fullContent = '';
  for await (const chunk of stream) {
    const { content } = chunk;

    if (typeof content !== 'string') {
      console.log({ content });
      throw new Error('Expected content to be a string');
    }

    yield content;

    fullContent += content;
  }

  console.log({ fullContent });
};

https://github.com/mikan3rd/trpc-openai-stream-sample/blob/431de4780d0c3f8f7494d8265f71cd686c0e55f0/src/server/functions/langchain.ts

Epilogue

I hope trpc becomes more widespread.

Blog

【TypeScript】Displaying ChatGPT-like Streaming Responses with trpc in React

mikan3rd