Machine-learning life-cycle management using MLflow
Honeybadger Staff
Posted on February 16, 2024
This article was originally written by Aditya Raj on the Honeybadger Developer Blog.
Machine-learning tasks are repetitive in nature. While working on a machine-learning project, we often need to change datasets, algorithms, and other hyperparameters to achieve maximum accuracy. In this process, we need to keep a record of all the algorithms, trained models, and their metrics. Tracking all the changes in the project over a period of time can be cumbersome. This is where MLflow comes in handy. This article discusses how to manage the entire life cycle of a machine-learning project using MLflow.
What is the machine-learning life cycle?
There are several steps in a machine-learning project, such as data cleaning, model training, and model deployment. The machine-learning life cycle includes all the steps used to develop, test, and deploy machine-learning models.
In a machine-learning project, we perform a subset of the following tasks.
- Data collection: The first step in any machine learning or data science project is to collect and pre-process the data. In data collection, we identify the sources of data, collect it, clean it, and prepare it for analysis.
- Data preparation: Data preparation helps us convert the pre-processed data into a format that we can use to train a machine-learning model. It involves data preprocessing techniques, such as data normalization, feature selection, feature extraction, and data transformation.
- Model selection: In model selection, we select the appropriate machine-learning model for our use case. The choice of model depends on the nature of the problem, the size of the data, and the type of data.
- Model training: After selecting a machine-learning model, we train it using the prepared data. While training the model, we use different samples of training data, as well as hyper-parameters to optimize the model's parameters and achieve the desired level of accuracy.
- Model testing: Once the model is trained, we test it on a separate test dataset to evaluate its performance. The test dataset is used to measure the model's accuracy and generalization performance. Each model trained using different data samples and hyper-parameter is tested for accuracy and generalization.
- Model evaluation: After testing the trained models, we evaluate their accuracy. The evaluation process helps identify shortcomings in the model and assess its overall performance.
- Model deployment: Once the model is trained and evaluated, we can deploy it in a production environment. This step involves integrating the model into the production system and testing its performance under real-world conditions.
- Monitoring and maintenance: Once the model is deployed, it needs to be monitored and maintained to ensure that it continues to perform accurately. This involves monitoring the model's performance, retraining it periodically, and updating it as needed.
All the above tasks constitute the machine-learning life cycle.
You can observe that a machine-learning project involves many steps, and each step can have several sub-tasks. This makes it difficult for machine-learning engineers and data scientists to handle large machine-learning projects. In this situation, we adapt software development principles and operations to handle machine-learning projects. Often, we refer to the processes and principles as machine-learning operations (MLOps).
What is MLOps in machine learning?
MLOps is the practice of applying development operations (DevOps) principles to the machine-learning life cycle. MLOps is a framework for managing the entire machine-learning life cycle from development to deployment and maintenance. It involves the integration of various tools and practices to streamline the machine-learning workflow and enable the automation of key processes in the entire project life-cycle.
MLOps covers a wide range of activities, including data management, model development, testing, deployment, monitoring, and maintenance. Here are some of the key components of MLOps:
- Version control: Version control is essential in machine learning. It helps us to manage each version of code, data, and models effectively. Using version control, we can collaborate effectively and track changes in code, data, and features of the machine-learning model over time.
- Continuous integration and continuous deployment (CI/CD): In a machine-learning cycle, we need to evaluate the model performance in the production environment, retrain them, and deploy the updated version. By using CI/CD, we can continuously test, integrate, and deploy machine-learning models to the production environment.
- Monitoring: Monitoring is an essential part of MLOps. We need to monitor the performance of models in production and identify issues that need to be addressed. This helps us make sure that the machine-learning model works effectively.
- Collaboration: Collaboration is critical in machine learning, as it helps teams work together effectively. MLOps tools help teams collaborate on code, data, and models and ensure that everyone is on the same page.
MLOps helps ensure that machine-learning models are developed, deployed, and maintained in a systematic and efficient manner. It helps organizations scale their machine-learning operations and deliver value to their customers more quickly and reliably.
Why do we need MLOps?
Machine-learning projects have entirely different life cycles compared to traditional software projects. While ML models can provide significant benefits in predictive and analytical tasks, they can also be complex, and their performance can be impacted by a wide range of factors.
Because machine learning is becoming more common in a wide variety of industries and increasing in importance, we need to define and implement processes to effectively handle the machine-learning life cycle. Thus, we need MLOps.
Following are some of the main reasons we need MLOps while working on a machine-learning project.
- Reproducibility: Machine-learning models are developed using large amounts of data. Their performance can vary depending on the specific data used in training. MLOps help us to ensure that models are developed using standardized processes and that their performance is reproducible across different environments.
- Scalability: Machine-learning models are often computationally intensive. They require significant computing resources to train and run. MLOps helps organizations scale their ML operations to handle large amounts of data and processing requirements using standard practices.
- Efficiency: We need to frequently update machine-learning models as new data becomes available or as the underlying business requirements change. MLOps helps us automate the process of updating and deploying models, saving time and improving efficiency.
- Risk management: Machine-learning models can be complex and may run into inadvertent errors if they are not properly tested and deployed. This can even lead to project failure and subsequent economic losses for the organization. MLOps help us manage the risk associated with machine-learning models by providing tools for testing, monitoring, and auditing models.
- Collaboration: Machine-learning models are typically developed by cross-functional teams that include data scientists, developers, and business stakeholders. MLOps help these teams collaborate more effectively by providing tools for version control, code sharing, and workflow management.
What is MLflow?
MLflow is an open-source MLOps tool for managing the machine-learning life cycle. It is designed to simplify the process of building, training, deploying, and monitoring machine-learning models. MLflow provides a comprehensive set of tools for managing the end-to-end machine-learning life cycle, including experiment tracking, model management, and deployment.
Here are some of the key features of MLflow:
- Experiment tracking: MLflow provides tools for recording and visualizing metrics, parameters, and artifacts from experiments. This allows us to keep track of each execution of the machine-learning models and easily compare their performance.
- Model management: MLflow provides a model registry for storing and versioning trained models. This allows data scientists to keep track of different versions of their models and easily deploy them to production.
- Model serving: MLflow provides a serving infrastructure for deploying models to production. This includes REST API endpoints for making predictions and deploying models to cloud platforms.
- Integration: MLflow integrates with a variety of machine-learning frameworks. You can use MLflow with TensorFlow, PyTorch, and Scikit-learn while creating a machine-learning project in Python. This allows you to use preferred tools and frameworks to build and train models while tracking all the models on the go.
You can also use MLflow with different programming languages like Python, Scala, Java, and R using its APIs. In the next sections, we will discuss different components of MLflow and how we can use them to manage different tasks in a machine-learning project.
Components of MLflow
MLflow has four main components: MLflow Tracking, MLflow Models, MLflow Model Registry, and MLflow Projects. Each of these components facilitates specific tasks as discussed below.
MLflow Tracking
As the name suggests, we use MLflow tracking to track and record the metrics, parameters, models, and execution environments in different runs of a machine-learning program.
- We can record parameters, code versions, models, and metrics for each run manually or automatically. MLflow allows us to record each component of the project environment separately using specific functions. If you want to record all the components in an automatic manner, MLflow also provides you with ways to do so.
- After recording components from different runs, you can organize and visualize different runs of the project to compare various metrics and parameters. For this, you can use the graphical user interface, as well as APIs given in the programming languages.
- MLflow Tracking helps you store datasets, parameters, binary files, metrics, and code in your local system separately for each run. Additionally, you can use database servers such as MySQL or SQLAlchemy to store all the information about each run of the code. The database doesn't store binary files. However, it helps you maintain records of where the binary files from each run are kept.
- Using MLflow Tracking, you can also track different files in cloud data platforms. You can track models in Amazon S3, Google cloud storage, an SFTP server, Azure Blob storage, a shared NFS file system, and others.
MLflow Model Registry
The MLflow Model Registry component of MLflow helps us store and track machine-learning models from training to deployment. We can create a model registry using a database, such as MySQL, and register all the models in the database.
- MLflow Model Registry helps you build a central repository to register machine-learning models with proper descriptions, names, versions, stages, and other metadata.
- After registering a model, you can update them, keep track of all the versions of the model, and change the metadata of the models per the requirements. You can also assign stages to the machine-learning models to represent the stage in which the models are currently in their life cycle.
- With the MLflow Registry component, you can also maintain better control and governance of the models to avoid unwanted incidents. It allows you to record stage transitions of machine-learning models, request changes in the stage and metadata, and review and approve changes. This helps the team members in a machine-learning project work efficiently.
MLflow Models
The MLflow Models component helps us to handle trained and untrained machine-learning models.
- With MLflow Models, you can package machine-learning models using standard formats provided by MLflow. You can then use the packaged models on different platforms.
- The MLflow Models component also allows you to load and use an existing model from the stored machine-learning models.
- You can also deploy machine-learning models using the MLflow Models components. MLflow provides deployment tools that help you quickly deploy machine-learning applications in a local or cloud environment.
MLflow Projects
The MLflow Projects component is used to package machine-learning models with proper specifications. It allows you to specify the requirements and software environment for executing the machine-learning model.
Now that we have discussed the basics of MLOps and MLflow, let us discuss how to manage the machine-learning life cycle using MLflow.
Working with MLflow using the sklearn module in Python
We can use MLflow to work with different software modules. In this tutorial, we will demonstrate the functions of MLflow using the sklearn module in Python.
Install and set up MLflow
To run the codes in this tutorial, you will need the sklearn and MLflow modules in Python. You can install both these modules using the following command.
pip3 install mlflow scikit-learn
You will also need a database management system, such as MySQL or SQLAlchemy. I will use MySQL for the tutorial. As a prerequisite, create a separate database in your DBMS to store data related to this tutorial. I have created a database named mlflowdb
as shown below.
After creating the database, we need to start MLflow Server. For this, we need to specify the location of the folder where the artifacts from the MLflow executions are stored. We will use the following directory to store the folder containing the artifacts.
Start MLflow Server using a command prompt
Before we start working on the machine-learning project, we need to start MLflow Server using the following command.
mlflow server --backend-store-uri location-of-database --default-artifact-root location-of-directory-for-storing-artifacts
-
You need to pass the address of the database in place of the
location-of-database
variable with a specified username and password. I have used "mysql+pymysql://Aditya:Mysql1234#@localhost/mlflowdb
".-
"Aditya"
is my username for logging in to the MySQL database. -
"Mysql1234#"
is the password for logging into MySQL. - I have the MySQL database installed on my computer. That's why the address is specified as
"localhost"
. -
"mlflowdb"
is the name of the database.
-
In place of
location-of-directory-for-storing-artifacts
, you need to specify the location where artifacts need to be saved. I will save them to the"/home/aditya1117/HoneyBadger/honeybadger-codes/mlruns"
directory.
The entire command looks as follows.
mlflow server --backend-store-uri mysql+pymysql://Aditya:Mysql1234#@localhost/mlflowdb --default-artifact-root /home/aditya1117/HoneyBadger/honeybadger-codes/mlruns
After executing the above command, MLflow Server will be started on your system at port 5000
. You can observe this in the following image.
After starting the server, a folder named mlruns will be created in the "/home/aditya1117/HoneyBadger/honeybadger-codes/
" directory, which previously contained only two folders.
The MLflow server runs on the 5000
port. Hence, you can open this port on your browser at the link localhost:5000
or 127.0.0.1:5000
. You will see the following output on the screen:
Since we haven't executed any code, the MLflow server shows no data. Once we start experiments, we can observe the output in the GUI.
Track machine-learning models using MLflow Tracking
To track machine-learning models using MLflow Tracking, we will first create an MLflow experiment. Then, we will run the experiment and log all the metrics, data, and parameters to the MLflow server. Finally, we will go to the MLflow server to track different models. Next, we’ll discuss each step.
Create an MLflow experiment
To create an experiment in MLflow, we will first import the necessary modules into our program. As we will be using the K-nearest neighbors (KNN) classification algorithm to demonstrate the functions in MLflow, let us import both modules using import statements.
import mlflow
from sklearn.neighbors import KNeighborsClassifier
After importing the modules, we will create an MLflow experiment. For this, we first need to specify the address of the tracking server for the experiment.
To specify the address of the tracking server, we will use the set_tracking_uri()
function in the MLflow module. This function takes the address of the tracking server and binds the experiment to the server. As our MLflow server is running on the localhost:5000
or 127.0.0.1:5000
, we will pass the address to the set_tracking_uri()
function as shown below.
tracking_server_uri = "http://localhost:5000"
mlflow.set_tracking_uri(tracking_server_uri)
After specifying the address of the tracking server, we will start the experiment. For this, we will use two functions.
- To start a new experiment, we will use the
create_experiment()
function. This function takes the name of the experiment as its input argument. If the system doesn't contain an experiment with the same name, thecreate_experiment()
function creates an experiment and returns the experiment ID. If the system already contains an experiment with the given name, thecreate_experiment()
function runs into an error. - If you want to restart an experiment that is already present in the system, you can use the
get_experiment_by_name()
function. This function takes the name of the experiment as its input argument and returns a placeholder for the experiment. We will then use theexperiment_id
attribute of the experiment to get the identifier for the experiment.
To start an experiment, we will use both functions. First, we will try to create a new experiment with a given experiment name using the create_experiment()
function. If there exists any other experiment with the same name, the program will raise an exception. Otherwise, the create_experiment()
function will create a new experiment and return the experiment ID.
If the create_experiment()
raises an exception, we will know that there is already an experiment with the given name. In this case, we will use the get_experiment_by_name()
function to activate the experiment. You can observe this in the following example.
experiment_name = 'KNNUsingMLFlow'
try:
exp_id = mlflow.create_experiment(name=experiment_name)
except Exception as e:
exp_id = mlflow.get_experiment_by_name(experiment_name).experiment_id
Once we create an experiment, it will appear on the MLflow Tracking server. You can go to the browser and refresh the tracking server URL. The screen gets updated as shown below.
In the above image, you can observe that an experiment with the name "KNNUsingMLFlow
" has been created. You can select the experiment to see its details as shown below.
As we haven't started the experiment, the above screen shows no data.
Now, let us start the experiment and record different parameters and metrics on the MLflow server.
Start the MLflow experiment
To start the experiment, we will use the start_run()
function. This function takes the experiment ID as an input argument to the experiment_id
parameter. After execution, it starts the experiment. After this, we can record parameters and metrics in the MLflow server.
- To record a parameter, we will use the
log_param()
function. This function takes the name of the parameter as its first input argument and the value for the parameter as its second input argument. After execution, it logs the parameter into the MLflow server. - To record a metric, we will use the
log_metric()
function. This function takes the name of the metric as its first input argument and the metric value as its second input argument. After execution, it logs the metric into the MLflow server. - To save a model, we will use the
log_model()
function. This function is specific to the machine-learning module we are using. As we are using the sklearn module, we will use thesklearn.log_model()
function. It takes the name of the model as its first input argument and the variable containing the model as its second input argument. After execution, it saves the model to the directory specified for the MLflow server. - After executing the experiment, we can stop the experiment using the
end_run()
function.
You can observe all the steps in the following code:
with mlflow.start_run(experiment_id=exp_id):
data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
#Create a list of class labels
class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
#Create an untrained model
n_neighbors=4
untrained_model=KNeighborsClassifier(n_neighbors=4, metric="euclidean")
#Train the model using the fit method
trained_model=untrained_model.fit(data_points,class_labels)
mlflow.log_param('n_neighbors', n_neighbors)
mlflow.log_param('data_points', data_points)
mlflow.log_param('class_labels', class_labels)
mlflow.log_metric('number_of_classes', 3)
mlflow.sklearn.log_model(untrained_model, "untrained_model")
mlflow.sklearn.log_model(trained_model, "trained_model")
mlflow.end_run()
In the above code, we have used the KNN classification algorithm to build a classification model. It is a simple and effective algorithm that we can use to classify new data points based on existing labeled data points. In KNN classification, the input data consists of a set of labeled instances. The goal is to predict the class label of an unknown instance based on its features and the labels of its nearest neighbors. When we pass a new data point to the KNN classifier, it finds the K-nearest neighbors of the existing labeled data points. Then, it calculates the majority of the class labels of the K-nearest data points. The majority class label of the neighbors is given as output as the prediction.
In the code, we have used 15 sample-labeled data points to train the classification model. Here, the data points are stored in the data_points
list, and their labels are stored in the class_labels
list.
- To create the machine-learning model for KNN classification, we have used the
KNeighborsClassifier()
function defined in the sklearn module. TheKNeighborsClassifier()
function takes the number of neighbors as an input argument to then_neighbors
parameter. It also takes the distance metric to use for calculating nearest neighbors. We will use the Euclidean distance metric, and hence, we will pass the literaleuclidean
to themetric
parameter as an input argument. - After execution, the
KNeighborsClassifier()
function returns an untrained KNN classification model. We can train this untrained model with a labeled dataset to classify new data points. - To train the KNN classification model, we will use the
fit()
method. Thefit()
method, when invoked on the untrained model, takes the training data points as its first input argument and the class labels of the input data points as the second input argument. After execution, it returns the trained KNN classification model. - Once we get the trained KNN classification model, we can use it to predict class labels of new data points using the
predict()
method.
Instead of using separate functions to log the parameters, models, and metrics, you can automatically log each object in the program environment while the experiment runs. For this, you can use the sklearn.autolog()
function as shown below.
with mlflow.start_run(experiment_id=exp_id):
mlflow.sklearn.autolog()
data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
#Create a list of class labels
class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
#Create an untrained model
untrained_model=KNeighborsClassifier(n_neighbors=3, metric="euclidean")
#Train the model using the fit method
trained_model=untrained_model.fit(data_points,class_labels)
mlflow.end_run()
The above code will also record all the metrics and parameters in the MLflow server.
After executing an experiment, if you go to the browser and refresh the URL of the MLflow server, you will observe that the experiment is recorded in the MLflow server with all the models, metrics, and parameters.
The binary files of the machine-learning models are saved in the directory that we specified while starting the MLflow server. As we have specified the mlruns
folder to store artifacts, you will observe that a new folder is created in the mlruns directory as shown below.
Here, 1
is the experiment ID. For each experiment, a separate folder will be created with the experiment ID as its name.
Inside the experiment ID folder, you will find another folder with a long alphanumeric name. There can be multiple folders in the experiment ID folder for a given experiment. Each time we run an experiment, a separate folder is created for each run, with the run ID as the name of the folder.
Inside the run ID folder, you will see a folder named artifacts
as shown below. This folder contains all the models saved during the experiment.
Inside the artifacts folder, you will see a separate folder for each saved model in a single run for the experiment. As we have saved both the trained and untrained models, you will see two folders as shown below.
Inside the directory of a model, you will get configuration files, a binary file of the model, a YAML file describing the environment, and the requirements.txt
file describing the dependencies. You can observe them in the following image.
The requirements.txt
file looks as follows.
The file describing the execution environment looks as follows.
The conda.yaml
file looks as follows.
You can observe that only the machine-learning models are saved in the file system, so where are the metrics and parameters stored?
They are stored in the database connected to the MLflow server.
As you can see in the following image, the database connected to MLflow doesn't contain any tables before the experiment.
Once we execute the experiment, different tables are created for storing metrics, parameters, the location of models, information regarding registered models, etc., as shown below.
In the metrics
table, each metric is stored with its value and run ID of the experiment as shown below.
Similarly, all the parameters are stored in the params
table with their names, values, and run IDs of the experiments.
You can track and visualize all the metrics and parameters in the GUI of the MLflow server. For this, go to any experiment and click on any run of the experiment. You will see the following screen with all the parameters, metrics, and models from the particular run of the experiment.
If you have run the same experiment at different times with different or the same parameters, each run is recorded in the MLflow server. You can observe this in the following image.
To compare metrics and parameters in different runs, you can select two runs as shown in the above image. Then, click on the compare button. You will see the following output on the screen.
In the above image, you can observe that the n_neighbors
parameter is plotted against the metric number_of_classes
for each run. You can also observe the scatter plot by clicking on the Scatter Plot
button above the visualization as shown below.
You can also compare the metrics and parameters for each run side-by-side using the Box Plot
button above the visualization as shown below.
At this point, we have discussed all the steps to track machine-learning models using the MLflow module.
Now, let us discuss how to manage machine-learning models using the MLflow registry component.
Manage machine-learning models using the MLflow registry
After analyzing different models using MLflow Tracking, you may want to move some models to production or archive some of the models. You might also need to add custom descriptions and tags to the models. We can perform all these operations using the MLflow model registry component.
Register a model in MLflow
To register a model, we can utilize either the user interface of the MLflow server or Python APIs. Let us first discuss how to register a model using the user interface. For this, we first need to open a particular run of the experiment, as we did while tracking.
Below the description, parameters, metrics, and tags tab, you will find the Artifacts tab as shown in the following image. As you can see, there are two artifacts: trained_model
and untrained_model
. This is because we logged both models previously using Python code.
Let us register the trained_model
artifact. For this, select the trained_model
artifact by clicking on its name. As you can see, you will get a button named Register Model
on the right side of the screen for registering the model. Click on the Register Model
button.
Register a new model in MLflow
Once you click the Register Model
button, the following dialogue screen will appear.
The above screen appears only when you are registering your first model. Otherwise, you will see a different screen as shown in the next subsection. In the above screen, click on the Select a model
dialogue box. You will then see the following.
On the above screen, click on Create New Model
. You will see the following screen.
In the above dialogue box, fill in the model name and click on the Register
button. Your model will be registered as shown below.
Register a new version of an existing model in MLflow
You can also register a new version of a previously registered model. For this, go to the model artifact and click on the Register Model button. You will see the following dialogue box.
In the above dialogue box, when you click on Select a model
, all the previously registered models will appear below the Create New Model
button. If you are registering a new version of an existing model, you can select the name of that model. Let us select TrainedKNNModel
.
After selecting the name of the model, you will see the following screen; click on Register
.
After clicking on the Register
button, your model will be registered as a new version of the existing model, and the following screen will appear.
You can track all the registered models by clicking on the Models
tab beside the Experiments
tab at the top of the screen. You can also go to http://127.0.0.1:5000/#/models
to have a look at all of your registered models. There, you will see output as shown below.
Register a model using Python code
You can also register a model using Python code. For this, you can use the register_model()
function. This function takes the location of the model artifact as the input argument to the model_uri
parameter and the name of the model as the input argument to the name
parameter. After execution, it registers the model in the MLflow server.
#Register a Model
model_name="UntrainedKNNModel"
model_uri="/home/aditya1117/HoneyBadger/honeybadger-codes/mlruns/1/c23f123eb48f4e45b4f08b506d7a2b80/artifacts/untrained_model"
model_details=mlflow.register_model(model_uri=model_uri,name=model_name)
All the registered models are recorded in the database in the registered_models
table. You can observe this in the following image.
Change the description of a registered model in MLflow
To change the description of a registered model, you can select any registered model from the Models
tab. You will see the following screen.
On the above screen, you can observe that there is an "Edit
" option near the description. Click on "Edit
". The following dialogue box will appear.
In the above text box, fill in the new description of the model and Click on the "Save
" button. The description will be updated as shown below.
We can also change the description of a registered model using Python code. For this, we’ll need to take the following steps.
- First, we will create an MLflow client using the
MlflowClient()
function. - Next, we will use the
update_registered_model()
method to update the description of the model. This method takes the name of the registered model as input to thename
parameter and the new description as input to thedescription
parameter. After execution, it changes the description of the specified model.
You can observe the entire process in the following example.
Below is the description of the "UntrainedKNNModel
" before executing the code to modify the description.
Now, let us execute the following code.
from mlflow.tracking.client import MlflowClient
client = MlflowClient()
model_name="UntrainedKNNModel"
description="This is the first untrained model registered using code."
updated_model_details=client.update_registered_model(name=model_name,description=description )
After executing the above code, the description of the model will be changed as shown in the following image.
Change the stage of a model in MLflow
When a model is initially registered in the MLflow server, its stage is set to None
. Depending on the requirements, we can set the stage of the model to staging
, production
, or archived
.
To change the stage of a model in the MLflow server GUI, go to the Models
tab.
In the above tab, click on the name of the model that you want to modify. Let us change the stage of the TrainedKNNModel
. After clicking on the model name, you will see the following output.
In the above image, click at the model version whose stage you want to change. After clicking the model version, you will see the following screen.
In the above screen, you can observe that the stage of the current model is set to None
. Click on the dropdown containing the None
stage. You will see the following options.
In the above image, you can change the stage of the model as you want. For instance, if you want to set the stage of the model to Production
, click on Transition to Production
. After this, you will see the following output.
In the above image, click on the OK
button. The stage of the model will be changed to Production
as shown below.
Instead of the user interface, we can also change the stage of the model using Python code. For this, we will use the transition_model_version_stage()
function. This function takes the model name, model version, and the destination stage as input arguments to the name
, version
, and stage
parameters, respectively. After execution, it changes the stage of the specified model and version.
To understand this, consider the following model. It is currently set to the None
stage.
Now, let us execute the code to change the stage of the model.
from mlflow.tracking.client import MlflowClient
client = MlflowClient()
model_name="UntrainedKNNModel"
model_version=1
client.transition_model_version_stage(name=model_name,version=model_version,stage="staging")
After executing the above code, the stage of the model is set to "Staging
" as shown below.
Delete a model from MLflow
We can either delete a particular version of a model or we can delete the entire model from MLflow.
To delete a particular version of a model, go to the page containing details of the particular version of the model as shown in the following image.
In the above image, first set the stage for the model to Archived
. This is because you cannot delete models that are in Staging
or Production
. Once you set the stage of the model to Archived
, click on the three dots on the top-right corner of the screen. After clicking, you will see the option to delete the model as shown below.
In the above image, click on the Delete
button, and your model will be deleted.
Instead of a single version, you can also delete the entire model and all its versions. For this, go to the Models
tab as shown below.
Click on the name of the model you want to delete. Let us delete the TrainedKNNModel
. After clicking on the model name, you will see the following screen.
Before deleting the model, you need to make sure that all the versions of the current model are in the Archived
or None
stage. Since all of our versions are in the None
stage, we can delete this model. For this, click on the three dots on the top-right corner. You will see the Delete
button as shown below.
Click on the Delete
button. A confirmation dialogue will appear as shown below.
Click on the Delete
button in the above dialogue. All the versions of the model will be deleted.
Working with trained machine-learning models using MLflow models
You can work with trained models using the MLflow Models component. Here, we can load an existing model from the file system and work with the model. We can also deploy the model using the MLflow server.
Load trained models into a Python program using MLflow
To load a trained machine-learning model from the file system into our program, we can use the load_model()
function. This function takes the URI of the model and loads it into the program. We can then use this model to predict class labels for new data points using the predict() method.
The predict() method, when invoked on the trained machine-learning model, takes a list of data points to classify. After execution, it returns an array containing a class label for each data point. You can observe this in the following example.
logged_model = 'runs:/c23f123eb48f4e45b4f08b506d7a2b80/trained_model'
loaded_model = mlflow.pyfunc.load_model(logged_model)
loaded_model.predict([(2,10)])
Output:
array(['C2'], dtype='<U2')
In the above example, the model assigns the class label C2
to data point (2, 10)
. For this, the predict()
method finds the three nearest data points in the training data from the point (2,10)
. The nearest data points are (2, 10)
, (3, 10)
, and (4, 9)
. These points have class labels C2
, C2
, and C2
. As all the three nearest data points have class label C2
, most of the class labels will also be C2
. Hence, the query data point (2, 10)
is assigned the class label C2
. After execution, the predict()
method returns the array containing the value C2
as its output.
If you want to predict the class labels of two or more data points, you can pass all the points to the predict()
method in the input list. After executing the predict()
method, you will get an array of class labels for all the query points.
To load a registered model, we don't need to specify the URI of the model. For this, we can use the name
and stage
of the model to load it into our program as shown below.
#load registered model
registered_model="models:/TrainedKNNModel/production"
loaded_model = mlflow.pyfunc.load_model(registered_model)
loaded_model.predict([(2,10)])
Output:
array(['C2'], dtype='<U2')
Deploy models in MLflow Server
To deploy a model in MLflow Server, we use the following syntax.
mlflow models serve -m location-of-the-model -p port_number --env-manager=local
Here is an explanation of the above code:
-
location-of-the-model
is the location of the directory containing the model. I will set it to/home/aditya1117/HoneyBadger/honeybadger-codes/mlruns/1/8e3ec00972d94c8d851c043379a337ed/artifacts/trained_model
. -
port_number
is the port at which we want to serve the model. I will set it to1234
. - I have set the
env-manager
variable tolocal
because the model will be deployed in the local environment.
The complete command looks as follows.
mlflow models serve -m /home/aditya1117/HoneyBadger/honeybadger-codes/mlruns/1/8e3ec00972d94c8d851c043379a337ed/artifacts/trained_model -p 1234 --env-manager=local
After executing the above command, the model will be deployed at localhost:1234
as shown below.
Now we can send requests to localhost:1234/invocations
or 127.0.0.1:1234/invocations
with the input data to the model, and it will return the output. For instance, let us use the curl
command to send a request to the model to classify the point (2, 10). For this, we will use the following syntax.
curl -d "[[2, 10]]" -H 'Content-Type: application/json' 127.0.0.1:1234/invocations
We will see the output in the terminal as follows.
You can observe that the model has returned the value C2
in a list after classifying the point (2, 10)
. Hence, the model is deployed correctly at the specified port number.
Conclusion
In this article, we have discussed how to manage the entire life cycle of a machine-learning project using MLflow. We created an untrained model, trained it in repetitions, compared the results, and deployed a model. We also discussed how to delete a model and change the metadata of a model.
I suggest that you execute the code in this article on your own system and experiment with the code and MLflow UI to better understand the concepts. This will help you easily manage your machine-learning projects.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy learning!
Posted on February 16, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.