Huggingface Transformers Pytorch Tutorial: Load, Predict and Serve/Deploy

wjiuhe

wjiuhe

Posted on April 16, 2022

Huggingface Transformers Pytorch Tutorial: Load, Predict and Serve/Deploy

Many of you must have heard of Bert, or transformers.
And you may also know huggingface.

In this tutorial, let's play with its pytorch transformer model and serve it through REST API

How the model works?

With an input of an incomplete sentence, the model will give its prediction:

Input:

Paris is the [MASK] of France.
Enter fullscreen mode Exit fullscreen mode

Output:

Paris is the capital of France.
Enter fullscreen mode Exit fullscreen mode

Cool~let's try this out now~

Prerequisite

For mac users

If you're working on a M1 Mac like me, you need install cmake and rust

brew install cmake
Enter fullscreen mode Exit fullscreen mode
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Enter fullscreen mode Exit fullscreen mode

Install dependencies

You can install dependencies using pip.

pip install tqdm boto3 requests regex sentencepiece sacremoses
Enter fullscreen mode Exit fullscreen mode

or you can use a docker image instead:

docker run -it -p 8000:8000 -v $(pwd):/opt/workspace huggingface/transformers-pytorch-cpu:4.18.0 bash
Enter fullscreen mode Exit fullscreen mode

Load the model

This will load the tokenizer and the model. It may take sometime to download.

import torch

# load tokenizer
tokenizer = torch.hub.load(
    "huggingface/pytorch-transformers",
    "tokenizer",
    "bert-base-cased",
)
# load masked model
masked_lm_model = torch.hub.load(
    "huggingface/pytorch-transformers",
    "modelForMaskedLM",
    "bert-base-cased",
)
Enter fullscreen mode Exit fullscreen mode

Define the predict function

The input text is: Paris is the [MASK] of France.

input_text = "Paris is the [MASK] of France."
Enter fullscreen mode Exit fullscreen mode

First we need to tokenize the

tokens = tokenizer(input_text)
Enter fullscreen mode Exit fullscreen mode

Let's have a look at the masked index:

mask_index = [
    i
    for i, token_id in enumerate(tokens["input_ids"])
    if token_id == tokenizer.mask_token_id
]
Enter fullscreen mode Exit fullscreen mode

Prepare the tensor:

segments_tensors = torch.tensor([tokens["token_type_ids"]])
tokens_tensor = torch.tensor([tokens["input_ids"]])
Enter fullscreen mode Exit fullscreen mode

Predict:

with torch.no_grad():
    predictions = masked_lm_model(
        tokens_tensor, token_type_ids=segments_tensors
    )
Enter fullscreen mode Exit fullscreen mode

Now, let's have a look at the result:

pred_tokens = torch.argmax(predictions[0][0], dim=1)

# replace the initail input text's mask with predicted text
for i in mask_index:
    tokens["input_ids"][i] = pred_tokens[i]
tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)
Enter fullscreen mode Exit fullscreen mode

Output:

'Paris is the capital of France.'
Enter fullscreen mode Exit fullscreen mode

Let's organize the codes in to a predict function:

def predict(input_text):
    # tokenize the input text
    tokens = tokenizer(input_text)

    # get all the mask index
    mask_index = [
        i
        for i, token_id in enumerate(tokens["input_ids"])
        if token_id == tokenizer.mask_token_id
    ]

    # convert the input ids and type ids to tensor
    segments_tensors = torch.tensor([tokens["token_type_ids"]])
    tokens_tensor = torch.tensor([tokens["input_ids"]])

    # run predictions
    with torch.no_grad():
        predictions = masked_lm_model(
            tokens_tensor, token_type_ids=segments_tensors
        )

    # pick the most likely predictions

    pred_tokens = torch.argmax(predictions[0][0], dim=1)

    # replace the initail input text's mask with predicted text
    for i in mask_index:
        tokens["input_ids"][i] = pred_tokens[i]
    return tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)
Enter fullscreen mode Exit fullscreen mode

Run:

predict("Paris is the [MASK] of France.")
Enter fullscreen mode Exit fullscreen mode

Output:

'Paris is the capital of France.'
Enter fullscreen mode Exit fullscreen mode

Serve it through REST API

First, let's install Pinferencia.

pip install "pinferencia[uvicorn]"
Enter fullscreen mode Exit fullscreen mode

If you haven't heard of Pinferencia, go to its github page https://github.com/underneathall/pinferencia or its homepage https://pinferencia.underneathall.app/ to check it out, it's an amazing library help you deploy your model with ease.

Let's save our predict function into a file app.py and add some lines to register it.

import torch
from pinferencia import Server

# load tokenizer
tokenizer = torch.hub.load(
    "huggingface/pytorch-transformers",
    "tokenizer",
    "bert-base-cased",
)
# load masked model
masked_lm_model = torch.hub.load(
    "huggingface/pytorch-transformers",
    "modelForMaskedLM",
    "bert-base-cased",
)


def predict(input_text):
    # tokenize the input text
    tokens = tokenizer(input_text)

    # get all the mask index
    mask_index = [
        i
        for i, token_id in enumerate(tokens["input_ids"])
        if token_id == tokenizer.mask_token_id
    ]

    # convert the input ids and type ids to tensor
    segments_tensors = torch.tensor([tokens["token_type_ids"]])
    tokens_tensor = torch.tensor([tokens["input_ids"]])

    # run predictions
    with torch.no_grad():
        predictions = masked_lm_model(
            tokens_tensor, token_type_ids=segments_tensors
        )

    # pick the most likely predictions
    pred_tokens = torch.argmax(predictions[0][0], dim=1)

    # replace the initail input text's mask with predicted text
    for i in mask_index:
        tokens["input_ids"][i] = pred_tokens[i]
    return tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)


service = Server()
service.register(model_name="transformer", model=predict)

Enter fullscreen mode Exit fullscreen mode

Run the service, and wait for it to load the model and start the server:

uvicorn app:service --reload
Enter fullscreen mode Exit fullscreen mode

Test the service:

Using curl:

curl --location --request POST 'http://127.0.0.1:8000/v1/models/transformer/predict' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": "Paris is the [MASK] of France."
}'
Enter fullscreen mode Exit fullscreen mode

Response:

{
    "model_name":"transformer",
    "data":"Paris is the capital of France."
}
Enter fullscreen mode Exit fullscreen mode

Cool~~ Not yet, even cooler:

You can use the swagger ui at http://127.0.0.1:8000 (the server's address) to try the prediction:
Swagger UI

Swagger UI

💖 💪 🙅 🚩
wjiuhe
wjiuhe

Posted on April 16, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related