Deploy Mistral LLM on Google Compute Engine with Docker, GPU Support, and Hugging Face Inference Server
Mahmoud Sehsah
Posted on January 21, 2024
Introduction
A practical guide on setting up Large Language Models (LLMs) on Google Compute Engine using GPUs. This guide is designed to walk you through the process step by step, making it easy for you to take advantage of the powerful combination of Google's cloud infrastructure and NVIDIA's GPU technology.
Machine Specs for the tutorial
Hardware Specifications:
GPU Information:
- GPU Type: Nvidia T4
- Number of GPUs: 2
- GPU Memory: 16 GB GDDR6 (per GPU)
Google compute engine Machine Type:
- Type: n1-highmem-4
- vCPUs: 4
- Cores: 2
- Memory: 26 GB
Disk Information:
- Disk Type: Balanced Persistent Disk
- Disk Size: 150 GB
Software Specifications:
Operating System:
- Ubuntu Version: 20.04 LTS
CUDA version:
- CUDA version: 12.3
1.Setting Up Docker
Follow these simple steps to get Docker up and running on your system:
1.1 Adding Docker's Official GPG Key
add Docker’s official GPG key to your system. This step is crucial for validating the authenticity of the Docker packages you'll be installing
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
1.2 Adding Docker Repository to Apt Sources
Add Docker's repository to your system's Apt sources. This allows you to fetch Docker packages from their official repository:
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
1.3 Installing Docker
Install Docker using the following command
sudo apt-get --reinstall install docker-ce
1.4 Create Docker Group
If not already present, add the 'docker' group to your system:
sudo groupadd docker
1.5 Add default User to Docker Group
Add your default user to the 'docker' group to manage Docker as a non-root user:
sudo usermod -aG docker $USER
1.6 check on Docker status
systemctl status docker
2.Install NVIDIA Container Toolkit
2.1 Add NVIDIA GPG Key and NVIDIA Container Toolkit Repository
Start by adding the NVIDIA GPG key to ensure the authenticity of the software packages and add the NVIDIA Container Toolkit repository to your system's software sources:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
2.2 Enable Experimental Features (Optional)
If you wish to use experimental features, uncomment the respective lines in the sources list:
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
2.3 Update Package Index and Install NVIDIA Toollit
pdate your package index and install the NVIDIA Container Toolkit:
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
3.Configure Container Toolkit
3.1 Configure NVIDIA Container Toolkit
Configure the NVIDIA Container Toolkit to work with Docker:
sudo nvidia-ctk runtime configure --runtime=docker
3.2 Restart Docker Service
Apply the changes by restarting the Docker service:
sudo systemctl restart docker
4.Prerequisites Before Installing CUDA Drivers
Ensure your system meets the following prerequisites before proceeding with the CUDA driver installation. For detailed guidance, refer to the official NVIDIA CUDA installation guide (https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions).
4.1 Verify CUDA-Capable GPU
First, confirm that your system has an NVIDIA GPU installed, this command should return information about the NVIDIA graphics card if one is present.
lspci | grep -i nvidia
4.2 Confirm Supported Linux Version
Ensure your Linux distribution is supported by checking its version, this command will display the architecture of your system and details about your Linux distribution :
uname -m && cat /etc/*release
4.3 Check Kernel Headers and Development Packages
Verify that your system has the appropriate kernel headers and development packages, which are essential for building the NVIDIA kernel module:
uname -r
5. Installing NVIDIA Drivers
Follow these steps to install NVIDIA drivers on your system. For detailed instructions, you can refer to the NVIDIA Tesla Installation Notes (https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html).
5.1 Install Required Kernel Headers
Start by installing the Linux kernel headers corresponding to your current kernel version:
sudo apt-get install linux-headers-$(uname -r)
5.2 Add the NVIDIA CUDA Repository
Identify your distribution's version and add the NVIDIA CUDA repository to your system:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
5.3 Update and Install NVIDIA Drivers
Finally, update your package lists and install the CUDA drivers, after the installation it will need a restart
sudo apt-get update
sudo apt-get -y install cuda-drivers
6. Post-Installation Steps for NVIDIA Driver
After successfully installing the NVIDIA drivers, perform the following post-installation steps to ensure everything is set up correctly. For a comprehensive guide, consult the NVIDIA CUDA Installation Guide for Linux (https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions).
6.1 Verify NVIDIA Persistence Daemon
Check the status of the NVIDIA Persistence Daemon to ensure it's running correctly:
systemctl status nvidia-persistenced
6.2 Monitor GPU Utilization
To confirm that your GPU is recognized and monitor its utilization, use:
nvidia-smi
Define model configuration
To set up the model configuration, you can use the following environment variables, the variable model is set to mistralai/Mistral-7B-v0.1, representing the Mistral-7B-v0.1 model for the tutorial, The variable volume is set to the present working directory ($PWD) followed by /data, indicating the directory path where data will be stored :
export model=mistralai/Mistral-7B-v0.1
export volume=$PWD/data
Run text-generation-inference using Docker
To perform text generation inference, we will will use the Huggingface text generation inference server (for more details check this url https://huggingface.co/docs/text-generation-inference/index), execute the following Docker command with the following parameters, and also here is the parameters explanation:
- --gpus all: Enables GPU support for Docker containers.
- --shm-size 1g: Sets the shared memory size to 1 gigabyte.
- -p 8080:80: Maps port 8080 on the host to port 80 in the Docker container.
- -v $volume:/data: Mounts the local data volume specified by $volume inside the Docker container at the /data path.
- ghcr.io/huggingface/text-generation-inference:1.3: Specifies the Docker image for text-generation-inference with the version tag 1.3.
- --model-id $model: Passes the specified model identifier ($model) to the text-generation-inference application.
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id $model
check GPU utilisation
Run again the GPU monitoring command to check the memory utilization after loading model weights into the GPU memory :
nvidia-smi
Test API endpoint
To test the API endpoint, use the following curl command:
curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
Posted on January 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.