MLflow
MLflow Integration Overview
MLflow is an open-source platform designed to help data scientists track, manage, and compare machine learning experiments. It provides a structured approach to evaluating model performance, helping teams identify the best-performing models for their use case.
RapidCanvas supports MLflow directly within workspaces, making it easier than ever to integrate experiment tracking into your model development workflow.
Key Capabilities
Create Experiments Users can set up and manage multiple runs by experimenting with different algorithms, datasets, and parameters using code recipes. This allows for a wide range of modelling approaches to be tested within the same project.
Track Performance Metrics Each run can log important metrics such as accuracy, precision, recall, F1 score, and more. These metrics are tracked over time, enabling users to monitor how different models perform under various configurations.
Compare Runs Across Flows Users can view and compare the results of different runs side-by-side using Mlflow UI. This makes it easy to identify and select the best-performing model for a specific experiment.
Working with MLflow in RapidCanvas
MLflow is now integrated into RapidCanvas workspaces to help data scientists efficiently manage, track, and compare machine learning experiments. Each workspace in RapidCanvas operates independently, and MLflow instances are isolated per workspace.
You can interact with MLflow through code recipes by ensuring the necessary libraries are installed in the environment.
To work with MLflow, you’ll also need to import helper functions and initiate MLflow tracking within your code.
Why Use MLflow?
Building a machine learning model is rarely a one-step process. Data scientists often go through multiple iterations, experimenting with different algorithms, parameters, and feature combinations. Manually logging the outcomes of these experiments—like accuracy, precision, and F1 score—in spreadsheets is both tedious and error-prone.
MLflow simplifies this by logging key metrics and parameters for each model run. This not only saves time but also makes it easy to compare results across multiple experiments and choose the best-performing model.
How It Works – Example with Titanic Dataset
Imagine you're building models using the Titanic dataset. You might perform the following:
Run 1: Use a specific feature with a basic algorithm; log accuracy, precision, F1 score.
Run 2: Try a different algorithm with altered parameters (features); track the new metrics.
Run 3: Use another algorithm with different parameters again.
Each of these executions becomes a run in MLflow. When you open the MLflow UI, all the runs are listed with their respective metrics, allowing you to easily compare and determine which model performed best. If you are running hundreds of experiments, MLflow will keep track of every run—helping you analyse results at scale.
You can also write code in code recipe to programmatically compare runs and select the one with the best metric (e.g., highest accuracy). The selected run can then be used to generate the best model within RapidCanvas using the provided helper functions.
Try It Out
RapidCanvas provides sample MLflow syntax to help you get started quickly. To explore and test MLflow in your code recipes:
Open any code recipe.
Click on the Syntax option.
Choose the sample MLflow scripts provided to understand how to structure your experiments.
Tip: Each workspace has an independent MLflow instance. All experiment tracking is isolated per workspace for security and isolation.
How to Access MLflow in RapidCanvas
To start using MLflow in RapidCanvas, simply navigate to the main menu of any workspace. From there, you’ll find the MLflow option readily available—just click to access and begin managing your machine learning experiments.
Requirements to Use MLflow
To run ML flows using code recipes in RapidCanvas, certain Python libraries must be installed in the environment where the flows are executed.
You can install these libraries in two ways:
At the environment level – making them available across all recipes using that environment.
At the recipe level – making them available only for a specific recipe.
Ensure that the necessary MLflow-related libraries are installed before executing any ML flow to avoid runtime issues.
Get Started: Using MLflow in RapidCanvas
Following these steps to begin using MLflow for tracking and comparing machine learning experiments within the RapidCanvas platform.
Step 1: Navigate to the Code Recipe from the Canvas
To begin writing code for ML experiments using different parameters (features) and algorithms, first open your project canvas. From there, select the Code Recipe option to access the coding interface where you can implement and run your MLflow experiments.
Step 2: Import Required functions
To use MLflow within your code recipes, make sure to import the necessary helper functions:
Step 3: Initialize Helper functions
Set up and initialize the imported helper functions to enable interaction with MLflow. These functions help manage runs, log parameters and metrics, and streamline experiment tracking.
Step 4: Use Sample Syntax to Experiment with ML Flows
Copy the sample MLflow syntax into the Code tab of your code recipe to start experimenting with multiple ML flows. This code helps you log metrics, compare runs, and evaluate different models efficiently.
Step 5: View Results in the MLflow UI
After running the recipe, the experiment runs will be logged and displayed in the MLflow UI. You can open the MLflow UI to view details of each run, including parameters, metrics (like accuracy, precision, and F1 score), and artifacts. This allows you to compare results across runs and identify which model performs best.
Step 6: Pick the Best Model
After analysing the results of various experiment runs in the MLflow UI, identify the run that produced the best performance metrics (e.g., highest accuracy or F1 score).
You can then reference the run ID or parameters from this best-performing run in your code recipe to regenerate or finalize the model within the platform. This allows you to consistently reproduce and use the optimal model for deployment or further analysis.
This following function allows you to log and save the trained model as part of the MLflow experiment tracking
Last updated