MLflow Tutorial with Image Recognition Example (Mnist)
Fei
Posted on October 31, 2023
Introduction
What is MLOps?
MLOps(Machine Learning Operations) is a core function of Machine Learning engineering, aimed to simplify the deployment, maintenance, monitoring of machine learning models in production with reliability and efficiency. MLOps needs the collaboration from data scientists, devops engineers, and operation engineers[1].
Fig 1. Process of MLOps [2]
What is MLflow?
MLflow is an open source MLOps platform developed by Databricks. It helps machine learning engineers track and manage the models, code, dependencies, as well as deploy the models to the production environment. In other words, MLflow can simplify the process of building, training, deploying, and monitoring machine-learning models. For more details, please check https://mlflow.org/docs/latest/what-is-mlflow.html .
Why are MLOps and MLflow?
When training neural networks or machine learning models, we often face challenges listed as below, we can manually document model parameters in a text file or back up models for each experiment. This can be cumbersome and hinder collaboration with others. However, with the assistance of MLOps and MLflow, we can effortlessly train, deploy, and track models.
Challenge 1: How to track your model's version when you need to retrain your models?
We need to train our models, and adjust the parameters, environments, dependencies, datasets. It is hard to maintain the models when we need to collaborate with others.Challenge 2: How to deploy a model and provide services to the public?
After we train a model, we need to publish the models and provide services to others. We need to provide APIs, and maintain the service reliable 7x24.Challenge 3: How to monitor the models?
Again, we need to track the service, maintain the service.
MLflow Demo:
In this example, we use an image recognition model based on Keras/TensorFlow, and MNIST dataset. If you are interested, you can follow[4] for the example of sklearn_logistic_regression.
Installation & Setup
Step 1. Install conda
https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
Step 2. Create a virtual environment for mflow
cd $your_folder
conda create -n mlflow (you can custom your environment name)Step 3: Activate the conda environment
conda activate mlflow
Step 4: Install mlfow
conda install mlflow (or pip install mlflow)
Step 5: Run an MLFlow server with a filestore backend, the default port is 5000, you can change it by ‘-p’ option (i.e., mlflow server -p 5111)
mlflow server -h 0.0.0.0 --backend-store-uri /home/xxx/mlruns ( you can custom your backend uri path)
Step 6: Test the web http://localhost:5000
Prepare and Run the MLflow Project
Step1: Clone the MLflow project[5][6]:
git clone https://github.com/RoboticsAndCloud/mlflow-examples.git
Step 2: Go to the mlflow-examples - Keras/TensorFlow - MNIST example:
cd mlflow-examples/python/keras_tf_mnist
Here, notice three files: MLproject, conda.yaml, train.py
MLproject: a YAML defines the MLflow project structure, including project name, dependencies(in conda.yaml), parameters and entry point. Check https://mlflow.org/docs/latest/projects.html for more details.
conda.yaml: a YAML file defines your project’s dependencies
train.py: A python script which shows how to train your models and log the parameters into MLflow
Step 3: Set the tracking URI
Here, we can export the environment variable to ensure our results are logged to our MLflow server
export MLfLOW_TRACKING_URI=http://localhost:5000
Step 4: Running the MLflow Experiment
mlflow run . --experiment-name=keras_mnist --run-name runname_first_keras_keras_mnist
Step 5: Check the MLflow Dashboard http://localhost:5000
Model Serving
Model Serving exposes the models, allowing us to access the service through REST API endpoints.
- Step 1: Get the run ID
Step 2: Server the model
mlflow models serve -m runs:/5c6a476d67d84239b01f874241f4009f/keras-model --port 5001
Step 3: Send recognition request and test the service
Target File Image '0':(You can find some from Mnist dataset)
Set the tracking URI
export MLfLOW_TRACKING_URI=http://localhost:5000
Send request
python3 keras_predict.py --model-uri runs:/5c6a476d67d84239b01f874241f4009f/keras-model --data-path /home/ascc/LF_Workspace/mnist_png/testing/0/9993.png
Results
We can see ‘0’ is the highest probability.
Send HTTP request and Results
Remember, the port should be the same with the model server port, here is 5001
python convert_png_to_mlflow_json.py /home/ascc/LF_Workspace/mnist_png/testing/0/9993.png | curl -X POST -H "Content-Type:application/json" -d @- http://localhost:5001/invocations
Model version control:
Once the model has been verified, you can proceed to register it, verify its version, and deploy it in another production environment.
Version control is crucial when deploying models in a production environment. It enables you to monitor changes and revert to a specific version if needed.
The development process can be segmented into multiple stages for model verification. MLflow defines three key stages: staging, production, and archived.
Errors and solution you may meet:
- Error 1
ModuleNotFoundError: No module named 'pip._vendor.six'
Solution :You need update your pipenv
pip install pip -U
pip install pipenv -U
- Error 2
AttributeError: module 'virtualenv.create.via_global_ref.builtin.cpython.mac_os' has no attribute 'CPython2macOsFramework'
Solution: Virtualenv installed twice ( apt, pip, delete the python3-virtualenv by apt, sudo apt purge python3-virtualenv)
- Error 3
mlflow.exceptions.MlflowException: Run '37011fce0ac847dbaa31efb5fabf842d' not found
Solution: You miss your tracking URI,
export MLfLOW_TRACKING_URI=http://localhost:5000
Summary:
This article demonstrates how to train, track, and manage a model using the MLflow platform. MLflow enables seamless design, deployment, and monitoring of machine learning models. Furthermore, we posit that MLOps can significantly enhance contributions to the field of AI.
Reference:
[1]MLOps https://www.databricks.com/glossary/mlops
[2]What Is MLOps https://ml-ops.org/content/mlops-principles
[3]MLflow component https://www.datacamp.com/tutorial/mlflow-streamline-machine-learning-workflow
[4]Getting Started With MLflow https://saturncloud.io/blog/getting-started-with-mlflow/
[5]MLflow examples https://github.com/amesar/mlflow-examples
[6]MLflow examples https://github.com/RoboticsAndCloud/mlflow-examples
Posted on October 31, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.