Shubham Kumar
Posted on October 1, 2023
Introduction
PMML
is a markup language to save your AI/ML model files so that you can use them for predictions later on (maybe during production).cPMML
is a library created by the AmadeusITGroup to parse and run predictions in C++. In this blog, we will train a linear regression model inpython
and generate a pmml
file and then we will run our predictions in C++
.
Creating a model file
Dependencies
We will need pandas
, numpy
, scikit-learn
and sklearn2pmml
.
pip install pandas numpy scikit-learn sklearn2pmml
Imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
The model
Dataset
For keeping things simple, let's train a linear regression model to match the equation, y = 2x + 1
. We can generate a random dataset for this equation.
X = np.random.rand(100, 1)
Y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
Test/Train data
Next, we'll divide the data into test and train datasets.
df = pd.DataFrame({'X': X.flatten(), 'Y': Y.flatten()})
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
X_train = train_df[['X']]
y_train = train_df['Y']
X_test = test_df[['X']]
y_test = test_df['Y']
Training the model
For training the model, we can get the model from scikit learn library and use the dataset we generated above. We can also check the mse
to get an idea of the model's accuracy.
pipeline = PMMLPipeline([
("regressor", LinearRegression())
])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Saving the pmml file
If you are satisfied by the performance of your model, you can export the model as a pmml file. We will save the model with the name, lr_model.pmml
sklearn2pmml(pipeline, "lr_model.pmml", with_repr = True)
Using the model file
The main step of focus in this blog is using the model in C++ program. For this, you will need to isntall the cPMML
library.
Installing cPMML
To install the libray in your system, you just need to run the below command. This will run cmake
, so you should have cmake
installed in your system.
git clone https://github.com/AmadeusITGroup/cPMML.git && cd cPMML && ./install.sh
For Mac M1
I ran into some problems while installing this on Mac M1. Here are the steps to install this effortlessly.
- Ensure you have the latest version of
cmake
installed in your system. - You can edit the
install.sh
script to remove-j 4
flag from thecmake -j 4 ..
command. This will turn off the multi processing. - The last line of the
install.sh
script issudo ldconfig
. Change this tosudo update_dyld_shared_cache
. This installs the.dylib
or.so
library files to proper destination.
Running the predictions
Include the library
The first thing is to import the library.
#include "cPMML.h"
#include <iostream>
Load the model
Then you can load the model.
int main() {
cpmml::Model model("lr_model.pmml");
return 0;
}
Start predictions
The cPMML
library takes input as an unordered_map of strings. For us, there is only one input which is X
.
int main() {
cpmml::Model model("lr_model.pmml");
// This shoule yield a value close to 1
std::unordered_map<std::string, std::string> input1 = {
{"X", "0"}
};
// This should yield a value close to 21
std::unordered_map<std::string, std::string> input2 = {
{"X", "10"}
};
std::cout<<"X = 0 Y = "<<model.predict(input1)<<'\n';
std::cout<<"X = 10 Y = "<<model.predict(input2)<<'\n';
return 0;
}
Compilation
You can compile the code by including the cPMML
library.
> g++ -std=c++11 predict.cpp -o predict.o -lcPMML
> ./predict.o
X = 0 Y = 0.967265
X = 10 Y = 21.369305
Conclusion
In this blog, we saw how to store your model as a PMML
file and load it in C++
using cPMML
library. You can view the code for the above here.
Posted on October 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.