AWS + Falcon Quickstart

Okay, so I haven't used SageMaker in a looooong time (maybe like 4 years or so ago) and I have to say, it feels like a whole new ecosystem. It's definitely a lot more intuitive, and waay more features. And now with easy access to foundational models as well as models from HuggingFace, it's starting to climb up my list of preferred ML/AI platforms.

And in this post I wanna show you how to deploy and use the shiny new Falcon models.

What is SageMaker

It's basically AWS' fully managed, end to end ML platform. This includes an IDE (Jupyter Notebooks), storage, foundational models and one click deployments. All of the infrastructure management is handled by AWS.

Roles and Permissions

The first thing you're going to need is to create a role (or maybe you have a default role you want to use), that has the AmazonSageMakerFullAccess policy attached to it. Give it a name you'll remember so you can use it later on.

In my case, I'm gonna go with:

AmazonSageMakerRole-experimentation

Creating a Domain

So from AWS:

A domain includes an associated Amazon Elastic File System (EFS) volume; a list of authorized users; and a variety of security, application, policy, and Amazon Virtual Private Cloud (VPC) configurations. Each user in a domain receives a personal and private home directory within the EFS for notebooks, Git repositories, and data files.

Essentially, it's your home for everything you'll need to train, finetune, build and deploy models. Both as an individual or collaboratively in a team.

Go ahead and navigate to SageMaker > Domains and create a domain. It'll take you to a screen like this:

You'll go with the quick set up for the purpose of this post.

Here you can use any name you want for your domain and a user you'll create with the role you previously created.

Once you've chosen your names and selected your role, it'll take some time to spin up your domain.

Deploying a model

Now that you're domain is running, head into users and select launch. You'll have a few options, go for studio.

Create a new notebook and we'll use the following code:

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'tiiuae/falcon-7b',
    'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
    "inputs": "Once upon a time in Narnia ",
})

This is the general code you'll use for all HuggingFace models. You'll just needed to change the details such as which model, what AWS instance, what Python libraries and versions etc.

Take note, in the predictor code, your choosing 1 instance of type ml.m5.xlarge. You can use different options here, but depending on your account you might need to adjust your quotas for resources like an ml.m5.xlarge instance!

Now you're ready to run and you should get some text generated.

The endpoint

So far you've just run it via a Jupyter notebook. But don't worry it's actually deployed an API you can use. So go to Deployments > Endpoints and you'll see the one you just deployed.

Once you click into it you'll be able to test the inference endpoint, as well as see various bits of information about your endpoint such as traffic patterns. You'll also see the actual endpoint that you can now use within your LLM powered applications.

Bits and pieces

This was a quickstart and SageMaker has a lot more to it. So I would recommend playing around with it as much as possible and deploy different models. Just remember to shut everything down when you're done!

Blog