How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)

airabbit

AIRabbit

Posted on October 27, 2024

How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)

How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)

In this tutorial, we'll walk you through the process of setting up and using Ollama for private model inference on a GPU-powered VM. Ollama allows you to run models privately, ensuring data security and faster inference times thanks to the power of GPUs, thereby significantly improving the performance and efficiency of your model inference tasks.

Outline

  1. Set up a VM with GPU on Vast.ai
  2. Start Jupyter Terminal
  3. Install Ollama
  4. Run Ollama Serve
  5. Test Ollama with a model

Setting Up a VM with GPU on Vast.ai

1. Create a VM with GPU:

  • Visit Vast.ai to create your VM.
  • Choose a VM with at least 30 GB of storage to accommodate the models and ensure cost-effectiveness (less than $0.30 per hour).

2. Start Jupyter Terminal:

  • Once your VM is up and running, open a terminal in Jupyter.

Downloading and Running Ollama

  1. Install Ollama: Run the command:
   bash curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

2. Run Ollama Serve:

  • Start the service with:
   bash ollama serve &
Enter fullscreen mode Exit fullscreen mode

3. Test Ollama with a Model:

  • Test your setup with a sample model like Mistral:
   bash ollama run mistral
Enter fullscreen mode Exit fullscreen mode

By following these steps, you can effectively utilize Ollama for private model inference on a VM with GPU. Happy prompting!

💖 💪 🙅 🚩
airabbit
AIRabbit

Posted on October 27, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related