Run Gemma on Google Colab Free tier
0xkoji
Posted on February 27, 2024
What is Gemma?
Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens
https://huggingface.co/blog/gemma
In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.
requirements
- HuggingFace account
- Google account
Step 1. Get access to Gemma
We can use Gemma with Transformers
4.38 but to do that first we need to get a grant to access the model.
https://huggingface.co/google/gemma-7b
Once you get a grant, you will see the below in the above page.
Step 2. Add HF_TOKEN to Google Colab
We need to add HF_TOKEN
to Google Colab to access gemma via Transformers.
First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens
Then click the key icon in the sidebar on Google Colab like below.
Step 3. Install packages
!pip install -U "transformers==4.38.1" --upgrade
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes
Step 4. Write Python code to run Gemma
We can use gemma-7b
model via transformers.
from transformers import AutoTokenizer, pipeline
import torch
model = "google/gemma-7b-it"
# use quantized model
pipeline = pipeline(
"text-generation",
model=model,
model_kwargs={
"torch_dtype": torch.float16,
"quantization_config": {"load_in_4bit": True}
},
)
messages = [
{"role": "user", "content": "Tell me about ChatGPT"},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
Result
The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲
ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:
Key Features:
- Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
- Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
- Conversation: It can engage in natural language conversation, answer questions, and provide information.
- Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
- Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.
Additional Information:
- Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
- Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
- Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development
Posted on February 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.