Run Gemma on Google Colab Free tier

0xkoji

0xkoji

Posted on February 27, 2024

Run Gemma on Google Colab Free tier

What is Gemma?

Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens

https://huggingface.co/blog/gemma

In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.

requirements

  • HuggingFace account
  • Google account

Step 1. Get access to Gemma

We can use Gemma with Transformers 4.38 but to do that first we need to get a grant to access the model.

https://huggingface.co/google/gemma-7b

Once you get a grant, you will see the below in the above page.

gemma model

Step 2. Add HF_TOKEN to Google Colab

We need to add HF_TOKEN to Google Colab to access gemma via Transformers.

First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens

Then click the key icon in the sidebar on Google Colab like below.

Image description

Step 3. Install packages

!pip install -U "transformers==4.38.1" --upgrade
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes
Enter fullscreen mode Exit fullscreen mode

Step 4. Write Python code to run Gemma

We can use gemma-7b model via transformers.

from transformers import AutoTokenizer, pipeline
import torch

model = "google/gemma-7b-it"
# use quantized model
pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={
        "torch_dtype": torch.float16,
        "quantization_config": {"load_in_4bit": True}
    },
)


messages = [
    {"role": "user", "content": "Tell me about ChatGPT"},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
Enter fullscreen mode Exit fullscreen mode

Result

The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲

ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:

Key Features:

  • Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
  • Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
  • Conversation: It can engage in natural language conversation, answer questions, and provide information.
  • Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
  • Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.

Additional Information:

  • Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
  • Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
  • Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development
💖 💪 🙅 🚩
0xkoji
0xkoji

Posted on February 27, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Run Gemma on Google Colab Free tier
generativeai Run Gemma on Google Colab Free tier

February 27, 2024