llm

Fine Tuning LLMs: Training with Cloud Resources

admantium

Sebastian

Posted on November 11, 2024

Fine Tuning LLMs: Training with Cloud Resources

Fine-Tuning LLMs with 7B or more parameters require substantial hardware resources. One option is to build and on-premise computer with powerful and costly GPUs. The other option is to use cloud environments, including free services, like Collab and Kaggle, and paid services, like Replicate and Paperspace. These environments offer Jupyter notebooks in which you can run your LLM fine-tuning code. However, these environments have constraints and limitations that need to be considered, such as the maximum amount of time that a notebook can run.

This article contains eight tricks when working with such cloud environments. You will learn how to inspect the cloud environment, define workloads to run on CPU or GPU, how to save and export training results as well as preventing sessions timeouts.

This article originally appeared at my blog admantium.com.

Overview

The tricks cover following aspects:

  • Inspect
    • Hardware Specification
    • Library and Binary Versions
  • Setup
    • Library Version Pinning
    • Binary Versions Pinning and Execution
    • Prevent Data Logging to External Providers
  • Running
    • Periodically Save Training Artifacts
    • Manually Save Training Artifacts
    • Prevent Session Timeout

Inspect

Hardware Specification

To see the details of the available hardware, use the following script:

# Source: https://www.kaggle.com/code/lukicdarkoo/kaggle-machine-specification-cpu-gpu-ram-os

from GPUtil import showUtilization as gpu_usage

def run(command):
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
    out, err = process.communicate()
    print(out.decode('utf-8').strip())

print('# CPU')
run('cat /proc/cpuinfo | egrep -m 1 "^model name"')
run('cat /proc/cpuinfo | egrep -m 1 "^cpu MHz"')
run('cat /proc/cpuinfo | egrep -m 1 "^cpu cores"')

print('# RAM')
run('cat /proc/meminfo | egrep "^MemTotal"')

print('# OS')
run('uname -a')

print('# GPU')
run('nvidia-smi')
Enter fullscreen mode Exit fullscreen mode

Example output from Kaggle:

# CPU
model name : Intel(R) Xeon(R) CPU @ 2.00GHz
cpu MHz  : 2000.174
cpu cores : 2
# RAM
MemTotal:       32880784 kB
# OS
Linux 5b1f6e39bcfd 5.15.133+ #1 SMP Tue Dec 19 13:14:11 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
# GPU
Fri Mar 29 09:49:48 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P100-PCIE-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0              25W / 250W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
| ID | GPU | MEM |
------------------
|  0 |  0% |  0% |
Enter fullscreen mode Exit fullscreen mode

Library and Binary Versions

To see all installed libraries in your notebook, run this:

!python --version
# Python 3.10.13

!conda list
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
absl-py                   1.4.0                    pypi_0    pypi
accelerate                0.28.0                   pypi_0    pypi
# ....
transformers              4.38.2                   pypi_0    pypi
# ....
Enter fullscreen mode Exit fullscreen mode

To see which other binaries are installed:

!find /usr/bin -executable|sort

usr/bin
/usr/bin/7z
/usr/bin/7za
/usr/bin/7zr
/usr/bin/X11
/usr/bin/apt
/usr/bin/bash
Enter fullscreen mode Exit fullscreen mode

Setup

Library Version Pinning

The majority of cloud provider notebooks come with pre-installed libraries. And most published notebooks typically install the latest version of all dependencies. This works for the libraries at the time the notebook was published! If you try a notebook that is 6 months old, the chances are high that it does not work anymore.

Software libraries evolve, including changes to their API, available parameters and return types. Therefore, it is crucial to apply restrictive version pinning in your projects to ensure that what runs today is still running 6 months later.

Here is an example from my Kaggle LLM Fine-Tuning notebook:

# Transformers installation
!pip install -U transformers==4.30 tensorflow==2.15
!pip install accelerate==0.27.2 peft==0.10.0 bitsandbytes==0.43.0 trl==0.8.1 datasets==2.1.0
!pip install einops==0.7.0 fsspec==2024.2.0
Enter fullscreen mode Exit fullscreen mode

Binary Version Pinning and Execution

Some projects require you to use a specific version of an installed binary, such as Python.

Running an internet search reveals a plethora of methods, dating back several years into the past, and include using Linux install commands, pipx, pyenv and conda.

In environments where conda is available, you can install a specific Python version as shown:

!conda create -n py3.8 -y \
  && source /opt/conda/bin/activate py3.8 \
  && conda install python=3.8 -y \
  && python --version
Enter fullscreen mode Exit fullscreen mode

When using this specific binary, you need to consider that each command in a Jupyter notebook s essential a one-off command. Therefore, you need to prepend all commands with the desired binary, and chain the commands together, like this:

!source /opt/conda/bin/activate py3.8 \
 && python --version \
 && cd llm-evaluation \
 && pip install -r requirements.txt \
Enter fullscreen mode Exit fullscreen mode

Prevent Data Logging to External Providers

Some cloud environments automatically enable external telemetry data to be captured and send.

On Kaggle, the wandb library is installed, which is invoked during training automatically. If you do not need it, you can uninstall it with this command:

!pip uninstall wandb -y
Enter fullscreen mode Exit fullscreen mode

Alternatively, you can set an environment variable.

import os

os.environ["WANDB_MODE"] = "offline"
Enter fullscreen mode Exit fullscreen mode

When using HuggingFace trainer library, disable all telemetry with this:

args = TrainingArguments(
    ...
    report_to=None,
)
Enter fullscreen mode Exit fullscreen mode

Execution

Periodically Save Training Artifacts

Some cloud environments do not guarantee a default runtime duration. Therefore, you should save training results automatically & periodically.

With the HuggingFace Trainer library, use this:

training_args = TrainingArguments(
    output_dir="./llama-7b-qlora-instruct",
    save_steps=1,
)
Enter fullscreen mode Exit fullscreen mode

With Tensorflow, you need to create a Checkpoint and CheckpoinManager object, and pass them to the trainer.

# source: https://www.tensorflow.org/guide/checkpoint
ckpt = tf.train.Checkpoint(step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator)
manager = tf.train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)

def train_and_checkpoint(net, manager):
  #...
  for _ in range(50):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    save_path = manager.save()
Enter fullscreen mode Exit fullscreen mode

Manually Export Training Artifacts

Output data resides in the virtual machine instance of the cloud provider. To get this data out, you have several cloud-provide specific and agonistic solutions. A list ordered by "most-generic" to "very specific":

Download via GUI

Some environments offer and option to download files from a dedicated directory path. First, create a zip file via a bash command, e.g. !zip -r file.zip "/kaggle/working/llama-7b-qlora-instruct/checkpoint-80". Second, download this zip file.

In Collab, you can access the file explorer via the GUI. Or you can trigger a Download dialog to open by executing his snippet:

from google.colab import files
files.download(zipfile_name)
Enter fullscreen mode Exit fullscreen mode

In Kaggle, you can also use the GUI, or open a clickable link with this code:

from IPython.display import FileLink, display
display(FileLink(zipfile_name)
Enter fullscreen mode Exit fullscreen mode

Upload to Cloud Storage

Another option is to upload the results to a cloud storage repository. Thereby, it is crucial that you trust the environment with providing required access credentials.

For accessing Google Storage, use the following snippet. It creates an inline-tile that starts an interactive login and then mounts the drive at the specified mount point.

from google.colab import drive
drive.mount('/content/gdrive')
Enter fullscreen mode Exit fullscreen mode

For accessing Amazon cloud storage, use the boto3 library:

s3_client = boto3.client('s3')
response = s3_client.upload_file(file_name, bucket, object_name)
Enter fullscreen mode Exit fullscreen mode

Prevent Session Timeout

Most cloud environments have an Idle timeout, which means that after a certain period where you do not engage with the site, the environment will be stopped, and your results lost. The key is to implement browser interactivity with a script. Open the browser console, then run the following script:

document.body.addEventListener('click', () => {
    console.log("click");
});

const click = () => {
    const simulate = new MouseEvent('click', {
        view: window,
        bubbles: true,
        cancelable: true,
        clientX: 100,
    });

    document.body.dispatchEvent(simulate);
}

const sleep = (delay) => new Promise((resolve) => setTimeout(resolve, delay))

const repeatedClick = async () => {
    while (true) {
        click();
        await sleep(60000);
    }
}

repeatedClick();
Enter fullscreen mode Exit fullscreen mode

This will keep the session active even when you un-focus the browser window.

Conclusion

When fine-tuning or evaluating LLMs in cloud environments, several restrictions apply. This blog post includes a set of tricks and best-practices to make these environments work more robust for your projects. You learned how to inspect the hardware, libraries and binaries, then how to apply strict version pinning, and finally how to periodically and automatically save results and prevent a sessions timeout.

💖 💪 🙅 🚩
admantium
Sebastian

Posted on November 11, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related