Huggingface Transformers Pytorch Tutorial: Load, Predict and Serve/Deploy
wjiuhe
Posted on April 16, 2022
Many of you must have heard of Bert
, or transformers
.
And you may also know huggingface.
In this tutorial, let's play with its pytorch transformer model and serve it through REST API
How the model works?
With an input of an incomplete sentence, the model will give its prediction:
Input:
Paris is the [MASK] of France.
Output:
Paris is the capital of France.
Cool~let's try this out now~
Prerequisite
For mac users
If you're working on a M1 Mac like me, you need install cmake
and rust
brew install cmake
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Install dependencies
You can install dependencies using pip.
pip install tqdm boto3 requests regex sentencepiece sacremoses
or you can use a docker image instead:
docker run -it -p 8000:8000 -v $(pwd):/opt/workspace huggingface/transformers-pytorch-cpu:4.18.0 bash
Load the model
This will load the tokenizer and the model. It may take sometime to download.
import torch
# load tokenizer
tokenizer = torch.hub.load(
"huggingface/pytorch-transformers",
"tokenizer",
"bert-base-cased",
)
# load masked model
masked_lm_model = torch.hub.load(
"huggingface/pytorch-transformers",
"modelForMaskedLM",
"bert-base-cased",
)
Define the predict function
The input text is: Paris is the [MASK] of France.
input_text = "Paris is the [MASK] of France."
First we need to tokenize the
tokens = tokenizer(input_text)
Let's have a look at the masked index:
mask_index = [
i
for i, token_id in enumerate(tokens["input_ids"])
if token_id == tokenizer.mask_token_id
]
Prepare the tensor:
segments_tensors = torch.tensor([tokens["token_type_ids"]])
tokens_tensor = torch.tensor([tokens["input_ids"]])
Predict:
with torch.no_grad():
predictions = masked_lm_model(
tokens_tensor, token_type_ids=segments_tensors
)
Now, let's have a look at the result:
pred_tokens = torch.argmax(predictions[0][0], dim=1)
# replace the initail input text's mask with predicted text
for i in mask_index:
tokens["input_ids"][i] = pred_tokens[i]
tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)
Output:
'Paris is the capital of France.'
Let's organize the codes in to a predict function:
def predict(input_text):
# tokenize the input text
tokens = tokenizer(input_text)
# get all the mask index
mask_index = [
i
for i, token_id in enumerate(tokens["input_ids"])
if token_id == tokenizer.mask_token_id
]
# convert the input ids and type ids to tensor
segments_tensors = torch.tensor([tokens["token_type_ids"]])
tokens_tensor = torch.tensor([tokens["input_ids"]])
# run predictions
with torch.no_grad():
predictions = masked_lm_model(
tokens_tensor, token_type_ids=segments_tensors
)
# pick the most likely predictions
pred_tokens = torch.argmax(predictions[0][0], dim=1)
# replace the initail input text's mask with predicted text
for i in mask_index:
tokens["input_ids"][i] = pred_tokens[i]
return tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)
Run:
predict("Paris is the [MASK] of France.")
Output:
'Paris is the capital of France.'
Serve it through REST API
First, let's install Pinferencia.
pip install "pinferencia[uvicorn]"
If you haven't heard of Pinferencia, go to its github page https://github.com/underneathall/pinferencia or its homepage https://pinferencia.underneathall.app/ to check it out, it's an amazing library help you deploy your model with ease.
Let's save our predict function into a file app.py
and add some lines to register it.
import torch
from pinferencia import Server
# load tokenizer
tokenizer = torch.hub.load(
"huggingface/pytorch-transformers",
"tokenizer",
"bert-base-cased",
)
# load masked model
masked_lm_model = torch.hub.load(
"huggingface/pytorch-transformers",
"modelForMaskedLM",
"bert-base-cased",
)
def predict(input_text):
# tokenize the input text
tokens = tokenizer(input_text)
# get all the mask index
mask_index = [
i
for i, token_id in enumerate(tokens["input_ids"])
if token_id == tokenizer.mask_token_id
]
# convert the input ids and type ids to tensor
segments_tensors = torch.tensor([tokens["token_type_ids"]])
tokens_tensor = torch.tensor([tokens["input_ids"]])
# run predictions
with torch.no_grad():
predictions = masked_lm_model(
tokens_tensor, token_type_ids=segments_tensors
)
# pick the most likely predictions
pred_tokens = torch.argmax(predictions[0][0], dim=1)
# replace the initail input text's mask with predicted text
for i in mask_index:
tokens["input_ids"][i] = pred_tokens[i]
return tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)
service = Server()
service.register(model_name="transformer", model=predict)
Run the service, and wait for it to load the model and start the server:
uvicorn app:service --reload
Test the service:
Using curl:
curl --location --request POST 'http://127.0.0.1:8000/v1/models/transformer/predict' \
--header 'Content-Type: application/json' \
--data-raw '{
"data": "Paris is the [MASK] of France."
}'
Response:
{
"model_name":"transformer",
"data":"Paris is the capital of France."
}
Cool~~ Not yet, even cooler:
You can use the swagger ui at http://127.0.0.1:8000 (the server's address) to try the prediction:
Posted on April 16, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
April 16, 2022