How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama
ppaanngggg
Posted on May 21, 2024
Introduction
Today, apps using Language Learning Machines (LLM) are growing fast. People use LLMs a lot to solve tough problems. LLMs are important in many areas like education, money matters, health, and more. Seeing this, developers worldwide are making lots of new apps using LLM. These apps are changing how we live, work, and talk to each other.
Counting tokens before sending prompts to the Language Learning Model (LLM) is important for two reasons. First, it helps users manage their budget. Knowing how many tokens a prompt uses can prevent surprise costs. Second, it helps the LLM work better. The total tokens in a prompt should be less than the model's maximum. If it's more, the model might not work as well or might even make mistakes.
Tokenizer in Backend vs Frontend
In text processing, the calculation of prompt tokens is a crucial task and there are essentially two methods to accomplish this.
Backend Implementation
The first, and often most common, solution is to run a tokenizer in the backend system of the application. This approach involves exposing an Application Programming Interface (API) for the frontend to invoke when needed. This method is generally straightforward to implement, especially given the existence of Python libraries like tiktoken
and tokenizers
that are designed specifically for this purpose and are incredibly user-friendly.
However, there are some drawbacks. Firstly, it's inefficient as it requires sending large volumes of text to the backend to receive a simple number. This can be particularly wasteful when handling exceptionally long text. Secondly, it misuses server CPU resources since the CPUs are constantly calculating tokens, which doesn't significantly contribute to the product's value. Lastly, notable latency occurs when a user is typing and waiting for the token count, leading to a poor user experience.
Frontend Implementation
Thanks to transformers.js, we can run the tokenizer and model locally in the browser. Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pretrained models using a very similar API.
Installation
To install via NPM, run:
npm i @xenova/transformers
To run transformers
on the client side of next.js
, you need to update the next.config.js
file:
/** @type {import('next').NextConfig} */
const nextConfig = {
// (Optional) Export as a static site
// See https://nextjs.org/docs/pages/building-your-application/deploying/static-exports#configuration
output: 'export', // Feel free to modify/remove this option
// Override the default webpack configuration
webpack: (config) => {
// See https://webpack.js.org/configuration/resolve/#resolvealias
config.resolve.alias = {
...config.resolve.alias,
"sharp$": false,
"onnxruntime-node$": false,
}
return config;
},
}
module.exports = nextConfig
Code Sample
- Firstly, you need to import AutoTokenizer from
@xenova/transformers
:
import { AutoTokenizer } from "@xenova/transformers";
- You can create a tokenizer using the
AutoTokenizer.from_pretrained
function, which requires thepretrained_model_name_or_path
parameter.Xenova
provides tokenizers designed for widely-used Language Learning Models (LLMs) like GPT-4, Claude-3, and Llama-3. To access these, visit the Hugging Face website, a hub for Machine Learning resources, at huggingface.co/Xenova. The tokenizer configurations for the latest GPT-4o model are available at Xenova/gpt-4o. You can create a tokenizer for GPT-4o now:
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/gpt-4o');
- The usage of the tokenizer is very similar to the tokenizer library in Python. The
tokenizer.encode
method can convert text into tokens.
const tokens = tokenizer.encode('hello world'); // [24912, 2375]
As you can see, the tokenizer of transformers.js is extremely easy to use. Due to its core code's implementation in Rust, it can calculate tokens at an impressive speed.
Demo
Using this pure browser technique, I created an all-in-one website to provide token counters for all popular models.
You can test tokenizer of GPT-4o there. There is a screenshot of page.
Posted on May 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
May 21, 2024