MLflow is an open-source platform designed to help data scientists track, manage, and compare machine learning experiments. It provides a structured approach to evaluating model performance, helping teams identify the best-performing models for their use case.
RapidCanvas supports MLflow directly within workspaces, making it easier than ever to integrate experiment tracking into your model development workflow.
Key Capabilities
Create Experiments
Users can set up and manage multiple runs by experimenting with different algorithms, datasets, and parameters using code recipes. This allows for a wide range of modelling approaches to be tested within the same project.
Track Performance Metrics
Each run can log important metrics such as accuracy, precision, recall, F1 score, and more. These metrics are tracked over time, enabling users to monitor how different models perform under various configurations.
Compare Runs Across Flows
Users can view and compare the results of different runs side-by-side using Mlflow UI. This makes it easy to identify and select the best-performing model for a specific experiment.
Working with MLflow in RapidCanvas
MLflow is now integrated into RapidCanvas workspaces to help data scientists efficiently manage, track, and compare machine learning experiments. Each workspace in RapidCanvas operates independently, and MLflow instances are isolated per workspace.
You can interact with MLflow through code recipes by ensuring the necessary libraries are installed in the environment.
To work with MLflow, you’ll also need to import helper functions and initiate MLflow tracking within your code.
Why Use MLflow?
Building a machine learning model is rarely a one-step process. Data scientists often go through multiple iterations, experimenting with different algorithms, parameters, and feature combinations. Manually logging the outcomes of these experiments—like accuracy, precision, and F1 score—in spreadsheets is both tedious and error-prone.
MLflow simplifies this by logging key metrics and parameters for each model run. This not only saves time but also makes it easy to compare results across multiple experiments and choose the best-performing model.
How It Works – Example with Titanic Dataset
Imagine you're building models using the Titanic dataset. You might perform the following:
Run 1: Use a specific feature with a basic algorithm; log accuracy, precision, F1 score.
Run 2: Try a different algorithm with altered parameters (features); track the new metrics.
Run 3: Use another algorithm with different parameters again.
Each of these executions becomes a run in MLflow. When you open the MLflow UI, all the runs are listed with their respective metrics, allowing you to easily compare and determine which model performed best. If you are running hundreds of experiments, MLflow will keep track of every run—helping you analyse results at scale.
You can also write code in code recipe to programmatically compare runs and select the one with the best metric (e.g., highest accuracy). The selected run can then be used to generate the best model within RapidCanvas using the provided helper functions.
Try It Out
RapidCanvas provides sample MLflow syntax to help you get started quickly.
To explore and test MLflow in your code recipes:
Open any code recipe.
Click on the Syntax option.
Choose the sample MLflow scripts provided to understand how to structure your experiments.
Tip: Each workspace has an independent MLflow instance. All experiment tracking is isolated per workspace for security and isolation.
How to Access MLflow in RapidCanvas
To start using MLflow in RapidCanvas, simply navigate to the main menu of any workspace. From there, you’ll find the MLflow option readily available—just click to access and begin managing your machine learning experiments.
Requirements to Use MLflow
To run ML flows using code recipes in RapidCanvas, certain Python libraries must be installed in the environment where the flows are executed.
You can install these libraries in two ways:
At the environment level – making them available across all recipes using that environment.
At the recipe level – making them available only for a specific recipe.
Ensure that the necessary MLflow-related libraries are installed before executing any ML flow to avoid runtime issues.
mlflow-skinny==2.22.0 urllib3==1.26.18
Get Started: Using MLflow in RapidCanvas
Following these steps to begin using MLflow for tracking and comparing machine learning experiments within the RapidCanvas platform.
Step 1: Navigate to the Code Recipe from the Canvas
To begin writing code for ML experiments using different parameters (features) and algorithms, first open your project canvas. From there, select the Code Recipe option to access the coding interface where you can implement and run your MLflow experiments.
Step 2: Import Required functions
To use MLflow within your code recipes, make sure to import the necessary helper functions:
from utils.notebookhelpers.helpers import Helpers
from utils.notebookhelpers.mlflow_utils import MLflowUtils
Step 3: Initialize Helper functions
Set up and initialize the imported helper functions to enable interaction with MLflow. These functions help manage runs, log parameters and metrics, and streamline experiment tracking.
Step 4: Use Sample Syntax to Experiment with ML Flows
Copy the sample MLflow syntax into the Code tab of your code recipe to start experimenting with multiple ML flows. This code helps you log metrics, compare runs, and evaluate different models efficiently.
Example Syntax
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
"""
Please ensure the following packages are installed in the environment either by checking the available internal packages or specifying them in the requirements.
seaborn mlflow-skinny==2.22.0 urllib3==1.26.18
"""
# Required imports
from utils.notebookhelpers.helpers import Helpers
from utils.notebookhelpers.mlflow_utils import MLflowUtils
context = Helpers.getOrCreateContext(contextId='contextId', localVars=locals())
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
MLflowUtils.init(context)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
import mlflow
import mlflow.sklearn
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
import numpy as np
# Load and prepare data
df = sns.load_dataset("titanic")[["survived", "sex", "age", "pclass", "fare"]].dropna() # Add more features
X = df.drop("survived", axis=1)
y = df["survived"].values
# Preprocessing for 'sex'
X['sex'] = X['sex'].map({'male': 1, 'female': 0})
# For simplicity in this demo, we'll just use 'sex' for the baseline
X_baseline = X[["sex"]].values.reshape(-1, 1)
y_baseline = y
X_train_base, X_test_base, y_train_base, y_test_base = train_test_split(X_baseline, y_baseline, test_size=0.3, random_state=42, stratify=y_baseline)
# Set experiment
mlflow.set_experiment("Titanic_Survival_Demo_Experiment")
# --- Run 1: Baseline Model (Sex Only) ---
with mlflow.start_run(run_name="Baseline_Sex_Only"):
mlflow.log_param("features", "sex_only")
mlflow.log_param("model_type", "LogisticRegression")
mlflow.log_param("scaling", "None") # Explicitly state no scaling
model_base = LogisticRegression(solver="liblinear", random_state=42)
model_base.fit(X_train_base, y_train_base)
y_pred_base = model_base.predict(X_test_base)
y_pred_proba_base = model_base.predict_proba(X_test_base)[:, 1] # For ROC later
# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test_base, y_pred_base))
mlflow.log_metric("precision", precision_score(y_test_base, y_pred_base, zero_division=0))
mlflow.log_metric("recall", recall_score(y_test_base, y_pred_base, zero_division=0))
mlflow.log_metric("f1_score", f1_score(y_test_base, y_pred_base, zero_division=0))
# Log model
mlflow.sklearn.log_model(model_base, "model_sex_only")
mlflow.set_tag("primary_model_path", "model_sex_only")
print("Baseline run logged.")
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Prepare data with more features
X_full_df = X[['sex', 'age', 'pclass', 'fare']] # Using already processed X as DataFrame
y_full = y # y is already a NumPy array
# Using DataFrame for X_full_df to retain column names for feature importance plot later
X_train_full, X_test_full, y_train_full, y_test_full = train_test_split(
X_full_df, y_full, test_size=0.3, random_state=42, stratify=y_full
)
# --- Run 2: Logistic Regression with More Features (No Scaling) ---
with mlflow.start_run(run_name="LogisticRegression_MoreFeatures_NoScale"):
mlflow.log_param("features", "sex, age, pclass, fare")
mlflow.log_param("model_type", "LogisticRegression")
mlflow.log_param("scaling", "None") # Indicate no scaling is used
# Model is trained on unscaled data
model_full_lr = LogisticRegression(solver="liblinear", random_state=42, max_iter=200) # Added max_iter for convergence with unscaled data
model_full_lr.fit(X_train_full, y_train_full) # Use unscaled X_train_full
y_pred_full_lr = model_full_lr.predict(X_test_full) # Use unscaled X_test_full
y_pred_proba_full_lr = model_full_lr.predict_proba(X_test_full)[:,1] # Use unscaled X_test_full
# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test_full, y_pred_full_lr))
mlflow.log_metric("precision", precision_score(y_test_full, y_pred_full_lr, zero_division=0))
mlflow.log_metric("recall", recall_score(y_test_full, y_pred_full_lr, zero_division=0))
mlflow.log_metric("f1_score", f1_score(y_test_full, y_pred_full_lr, zero_division=0))
# Log model
mlflow.sklearn.log_model(model_full_lr, "model_more_features_lr_noscale")
mlflow.set_tag("primary_model_path", "model_more_features_lr_noscale")
# No scaler to log
# Log a ROC curve as an artifact
from sklearn.metrics import roc_curve, auc # Moved import here as it's only used here now
fpr, tpr, thresholds = roc_curve(y_test_full, y_pred_proba_full_lr)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:0.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC (LR More Features - No Scale)')
plt.legend(loc="lower right")
# plt.savefig("roc_curve_more_features_lr_noscale.png") # Lines for saving artifact remain commented
# mlflow.log_artifact("roc_curve_more_features_lr_noscale.png")
plt.close()
print("Logistic Regression with more features (no scaling) run logged.")
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from sklearn.ensemble import RandomForestClassifier
# --- Run 3: Random Forest with More Features (No Scaling) ---
with mlflow.start_run(run_name="RandomForest_MoreFeatures_NoScale"):
mlflow.log_param("features", "sex, age, pclass, fare")
mlflow.log_param("model_type", "RandomForestClassifier")
mlflow.log_param("scaling", "None") # Indicate no scaling is used
# Manually set parameters for Random Forest
best_params = {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 4} # Example
model_rf = RandomForestClassifier(**best_params, random_state=42)
mlflow.log_params(best_params)
# Model is trained on unscaled data (X_train_full, y_train_full from previous cell)
model_rf.fit(X_train_full, y_train_full)
y_pred_rf = model_rf.predict(X_test_full) # Use unscaled X_test_full
y_pred_proba_rf = model_rf.predict_proba(X_test_full)[:,1] # Use unscaled X_test_full
# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test_full, y_pred_rf))
mlflow.log_metric("precision", precision_score(y_test_full, y_pred_rf, zero_division=0))
mlflow.log_metric("recall", recall_score(y_test_full, y_pred_rf, zero_division=0))
mlflow.log_metric("f1_score", f1_score(y_test_full, y_pred_rf, zero_division=0))
# Log model
mlflow.sklearn.log_model(model_rf, "model_random_forest_noscale")
mlflow.set_tag("primary_model_path", "model_random_forest_noscale")
# Log a feature importance plot for Random Forest
if hasattr(model_rf, 'feature_importances_'):
importances = model_rf.feature_importances_
# feature_names = X_full.columns # X_full was original features before split
feature_names = X_train_full.columns # Use columns from the DataFrame used for training
sorted_indices = np.argsort(importances)[::-1]
plt.figure(figsize=(10,6))
plt.title("Feature Importances (Random Forest - No Scale)")
# Use X_train_full.shape[1] for number of features
plt.bar(range(X_train_full.shape[1]), importances[sorted_indices], align="center")
plt.xticks(range(X_train_full.shape[1]), feature_names[sorted_indices], rotation=90)
plt.tight_layout()
# plt.savefig("feature_importances_rf_noscale.png") # Lines for saving artifact remain commented
# mlflow.log_artifact("feature_importances_rf_noscale.png")
plt.close()
print("Random Forest with more features (no scaling) run logged.")
import mlflow
# mlflow.sklearn is already imported, but good to ensure for standalone cell
import mlflow.sklearn
import time
# Name for the registered model in the Model Registry
model_name_reg = "TitanicSurvivalPredictor_DS_Demo"
client = mlflow.tracking.MlflowClient()
# Define criteria for selecting the model
source_experiment_name = "Titanic_Survival_Demo_Experiment"
selection_metric_name = "accuracy"
print(f"Searching for the run with best '{selection_metric_name}' in experiment '{source_experiment_name}'...")
try:
experiment = client.get_experiment_by_name(source_experiment_name)
if experiment:
runs = client.search_runs(
experiment_ids=experiment.experiment_id,
order_by=[f"metrics.{selection_metric_name} DESC"],
max_results=1
)
if runs:
best_run_to_register = runs[0]
best_run_id = best_run_to_register.info.run_id
best_run_metric_value = best_run_to_register.data.metrics.get(selection_metric_name)
source_run_name_tag = best_run_to_register.data.tags.get('mlflow.runName', 'N/A')
model_artifact_path_in_source_run = best_run_to_register.data.tags.get("primary_model_path")
if not model_artifact_path_in_source_run:
print(f"Error: Run '{source_run_name_tag}' (ID: {best_run_id}) does not have the 'primary_model_path' tag. Cannot register model.")
else:
model_uri_for_registration = f"runs:/{best_run_id}/{model_artifact_path_in_source_run}"
print(f"Found best model from run: '{source_run_name_tag}' (ID: {best_run_id}) with {selection_metric_name}: {best_run_metric_value}")
print(f"Attempting to register model from URI: {model_uri_for_registration} as '{model_name_reg}'")
try:
registered_model_version = mlflow.register_model(
model_uri=model_uri_for_registration,
name=model_name_reg,
tags={
"source_run_id": best_run_id,
"selection_metric": f"{selection_metric_name}_{best_run_metric_value}"
}
)
print(f"Successfully registered model '{model_name_reg}', version: {registered_model_version.version}")
print(f"Current stage: {registered_model_version.current_stage}")
version_description = (
f"Model selected from run '{source_run_name_tag}' (ID: {best_run_id}).
"
f"Achieved {selection_metric_name}: {best_run_metric_value}.
"
f"Features: {best_run_to_register.data.params.get('features', 'N/A')}.
"
f"Model Type: {best_run_to_register.data.params.get('model_type', 'N/A')}.
"
f"Scaling: {best_run_to_register.data.params.get('scaling', 'N/A')}." # This will now show 'None'
)
client.update_model_version(
name=model_name_reg,
version=registered_model_version.version,
description=version_description
)
print(f"Description updated for model version {registered_model_version.version}.")
client.set_model_version_tag(name=model_name_reg, version=registered_model_version.version, key="validation_status", value="candidate")
client.set_model_version_tag(name=model_name_reg, version=registered_model_version.version, key="data_snapshot_id", value="titanic_20250530") # Example date
client.set_model_version_tag(name=model_name_reg, version=registered_model_version.version, key="picked_by", value="best_accuracy_script")
print(f"Tags added to model version {registered_model_version.version}.")
alias_name = "best-candidate"
client.set_registered_model_alias(
name=model_name_reg,
alias=alias_name,
version=registered_model_version.version
)
print(f"Alias '{alias_name}' set for model version {registered_model_version.version}.")
client.set_registered_model_tag(name=model_name_reg, key="project", value="Titanic Survival Demo")
client.set_registered_model_tag(name=model_name_reg, key="domain", value="Passenger Classification")
client.set_registered_model_tag(name=model_name_reg, key="model_goal", value="Predict survival")
print(f"Tags added to registered model '{model_name_reg}'.")
except mlflow.exceptions.MlflowException as e:
print(f"Error during model registration or update: {e}")
else:
print(f"No runs found in experiment '{source_experiment_name}'. Model registration skipped.")
else:
print(f"Experiment '{source_experiment_name}' not found. Model registration skipped.")
except Exception as e:
import traceback
print(f"An unexpected error occurred in model registration cell: {e}")
traceback.print_exc()
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
Helpers.save_output_mlflow_model(context=context, model_name=model_name_reg)
Step 5: View Results in the MLflow UI
After running the recipe, the experiment runs will be logged and displayed in the MLflow UI.
You can open the MLflow UI to view details of each run, including parameters, metrics (like accuracy, precision, and F1 score), and artifacts. This allows you to compare results across runs and identify which model performs best.
Step 6: Pick the Best Model
After analysing the results of various experiment runs in the MLflow UI, identify the run that produced the best performance metrics (e.g., highest accuracy or F1 score).
You can then reference the run ID or parameters from this best-performing run in your code recipe to regenerate or finalize the model within the platform. This allows you to consistently reproduce and use the optimal model for deployment or further analysis.
This following function allows you to log and save the trained model as part of the MLflow experiment tracking