MLflow Tutorial with Image Recognition Example (Mnist)

Introduction

What is MLOps?

MLOps(Machine Learning Operations) is a core function of Machine Learning engineering, aimed to simplify the deployment, maintenance, monitoring of machine learning models in production with reliability and efficiency. MLOps needs the collaboration from data scientists, devops engineers, and operation engineers[1].

Fig 1. Process of MLOps [2]

What is MLflow?

MLflow is an open source MLOps platform developed by Databricks. It helps machine learning engineers track and manage the models, code, dependencies, as well as deploy the models to the production environment. In other words, MLflow can simplify the process of building, training, deploying, and monitoring machine-learning models. For more details, please check https://mlflow.org/docs/latest/what-is-mlflow.html .

Fig 2. MLflow Component[3]

Why are MLOps and MLflow?

When training neural networks or machine learning models, we often face challenges listed as below, we can manually document model parameters in a text file or back up models for each experiment. This can be cumbersome and hinder collaboration with others. However, with the assistance of MLOps and MLflow, we can effortlessly train, deploy, and track models.

Challenge 1: How to track your model's version when you need to retrain your models?
We need to train our models, and adjust the parameters, environments, dependencies, datasets. It is hard to maintain the models when we need to collaborate with others.
Challenge 2: How to deploy a model and provide services to the public?
After we train a model, we need to publish the models and provide services to others. We need to provide APIs, and maintain the service reliable 7x24.
Challenge 3: How to monitor the models?
Again, we need to track the service, maintain the service.

MLflow Demo:

In this example, we use an image recognition model based on Keras/TensorFlow, and MNIST dataset. If you are interested, you can follow[4] for the example of sklearn_logistic_regression.

Installation & Setup

Step 1. Install conda
https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
Step 2. Create a virtual environment for mflow
cd $your_folder conda create -n mlflow (you can custom your environment name)
Step 3: Activate the conda environment
conda activate mlflow
Step 4: Install mlfow
conda install mlflow (or pip install mlflow)
Step 5: Run an MLFlow server with a filestore backend, the default port is 5000, you can change it by ‘-p’ option (i.e., mlflow server -p 5111)
mlflow server -h 0.0.0.0 --backend-store-uri /home/xxx/mlruns ( you can custom your backend uri path)
Step 6: Test the web http://localhost:5000

Prepare and Run the MLflow Project

Step1: Clone the MLflow project[5][6]:
git clone https://github.com/RoboticsAndCloud/mlflow-examples.git
Step 2: Go to the mlflow-examples - Keras/TensorFlow - MNIST example:
cd mlflow-examples/python/keras_tf_mnist

Here, notice three files: MLproject, conda.yaml, train.py

MLproject: a YAML defines the MLflow project structure, including project name, dependencies(in conda.yaml), parameters and entry point. Check https://mlflow.org/docs/latest/projects.html for more details.

conda.yaml: a YAML file defines your project’s dependencies

train.py: A python script which shows how to train your models and log the parameters into MLflow

Step 3: Set the tracking URI
Here, we can export the environment variable to ensure our results are logged to our MLflow server
export MLfLOW_TRACKING_URI=http://localhost:5000
Step 4: Running the MLflow Experiment
mlflow run . --experiment-name=keras_mnist --run-name runname_first_keras_keras_mnist
Step 5: Check the MLflow Dashboard http://localhost:5000

Model Serving

Model Serving exposes the models, allowing us to access the service through REST API endpoints.

Step 1: Get the run ID

Step 2: Server the model
mlflow models serve -m runs:/5c6a476d67d84239b01f874241f4009f/keras-model --port 5001
Step 3: Send recognition request and test the service

Target File Image '0':(You can find some from Mnist dataset)

Set the tracking URI

export MLfLOW_TRACKING_URI=http://localhost:5000

Send request

python3 keras_predict.py --model-uri runs:/5c6a476d67d84239b01f874241f4009f/keras-model  --data-path /home/ascc/LF_Workspace/mnist_png/testing/0/9993.png

Results

We can see ‘0’ is the highest probability.

Send HTTP request and Results
Remember, the port should be the same with the model server port, here is 5001

python convert_png_to_mlflow_json.py /home/ascc/LF_Workspace/mnist_png/testing/0/9993.png | curl -X POST -H "Content-Type:application/json"   -d @-   http://localhost:5001/invocations

Model version control:

Once the model has been verified, you can proceed to register it, verify its version, and deploy it in another production environment.

Version control is crucial when deploying models in a production environment. It enables you to monitor changes and revert to a specific version if needed.

The development process can be segmented into multiple stages for model verification. MLflow defines three key stages: staging, production, and archived.

Errors and solution you may meet:

Error 1 ModuleNotFoundError: No module named 'pip._vendor.six'

Solution :You need update your pipenv
pip install pip -U
pip install pipenv -U

Error 2 AttributeError: module 'virtualenv.create.via_global_ref.builtin.cpython.mac_os' has no attribute 'CPython2macOsFramework'

Solution: Virtualenv installed twice ( apt, pip, delete the python3-virtualenv by apt, sudo apt purge python3-virtualenv)

Error 3 mlflow.exceptions.MlflowException: Run '37011fce0ac847dbaa31efb5fabf842d' not found

Solution: You miss your tracking URI, 
export MLfLOW_TRACKING_URI=http://localhost:5000

Summary:

This article demonstrates how to train, track, and manage a model using the MLflow platform. MLflow enables seamless design, deployment, and monitoring of machine learning models. Furthermore, we posit that MLOps can significantly enhance contributions to the field of AI.

Reference:

[1]MLOps https://www.databricks.com/glossary/mlops
[2]What Is MLOps https://ml-ops.org/content/mlops-principles
[3]MLflow component https://www.datacamp.com/tutorial/mlflow-streamline-machine-learning-workflow
[4]Getting Started With MLflow https://saturncloud.io/blog/getting-started-with-mlflow/
[5]MLflow examples https://github.com/amesar/mlflow-examples
[6]MLflow examples https://github.com/RoboticsAndCloud/mlflow-examples

Blog