Different ways of making REST Endpoint inference with Sagemaker SDK

erm0rden

Ayush Rastogi

Posted on May 2, 2022

Different ways of making REST Endpoint inference with Sagemaker SDK

In this article we are going to see two ways of making inference from a machine learning model endpoint using Sagemaker SDK. The machine learning model is trained on 'Amazon Customer Reviews Dataset' and can predict customer review rating on a scale of 1-5. As a part of transfer learning, I have used BERT model with pre-training carried out through DistilBERT, using Hugging face transformer, which provides bi-directional attention, and TensorFlow. Details regarding model implementation are shown below.

High-level ML model details

The first step for the inference script is to include the boiler plate details like session, bucket, role, region, account ID and Sagemaker object.

Boiler plate Sagemaker setup code

The second step is to include the details completed during the model deployment stage. This will include the training job name, model name, and endpoint details.

Retrieving model endpoint

Using the deployment stage details, a predictor object needs to be created which includes session details and the endpoint name. We are using JSONserializer and JSONdeserializer to format the input, interpret it and then deserialize it into an object type.

Creating predictor object

Category 1: Real-time/Ad-hoc predictions
Once this predictor is created, it can be used to make ad-hoc predictions by providing inputs as features, shown below. Here function predictor.predict() is being used to make real-time predictions against Sagemaker endpoints with python objects. Four examples of Amazon reviews have been shown here.

Real-time predictions

Category 2: Batch predictions
The reviews are stored in a .tsv file (tab separated value) and can be processed all at once. Predictions of 1000 sample reviews out of 102,084 reviews, were made which had a duration of 5 mins 45 seconds based on the compute being used. By providing the argument of 10, on the df_sample_reviews.head() call, we can retrieve the small number and validate the predictions. The column which serves as an input is 'review body' and predictions are compared against 'star rating'.

Batch predictions

Predictions can also be invoked model as a SageMaker Endpoint directly using the following HTTP request/response syntax. This approach is primarily being used when models are deployed as microservices in a production environment.

References:

💖 💪 🙅 🚩
erm0rden
Ayush Rastogi

Posted on May 2, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related