RapidCanvas Docs
  • Welcome
  • GETTING STARTED
    • Quick start guide
    • Introduction to RapidCanvas
    • RapidCanvas Concepts
    • Accessing the platform
  • BASIC
    • Projects
      • Projects Overview
        • Creating a project
        • Reviewing the Projects listing page
        • Duplicating a Project
        • Modifying the project settings
        • Deleting Project(s)
        • Configuring global variables at the project level
        • Working on a project
        • Generating the about content for the project
        • Generating AI snippets for each node on the Canvas
        • Marking & Unmarking a Project as Favorite
      • Canvas overview
        • Shortcut options on canvas
        • Queuing the Recipes
        • Bulk Deletion of Canvas Nodes
        • AI Guide
        • Viewing and Managing the Environment Status
      • Recipes
        • AI-assisted recipe
        • Rapid model recipe
        • Template recipe
        • Code Recipe
        • RAG Recipes
        • Viewing & Managing a Recipe from the Canvas
      • Scheduler overview
        • Creating a scheduler
        • Running the scheduler manually
        • Managing schedulers in a project
        • Viewing the schedulers in a project
        • Viewing the run history of a specific scheduler
        • Publishing the updated data pipeline to selected jobs from canvas
        • Fetching the latest data pipeline to a specific scheduler
        • Comparing the canvas of the scheduler with current canvas of the project
      • Predictions
        • Manual Prediction
        • Prediction Scheduler
      • Segments and Scenarios
      • DataApps
        • Model DataApp
        • Project Canvas Datasets
        • Custom Uploaded Datasets
        • SQL Sources
        • Documents and PDFs
        • Prediction Service
        • Scheduler
        • Import DataApp
    • Connectors
      • Importing dataset(s) from the local system
      • Importing Text Files from the Local System
      • Viewing options in the side panel of a dataset block
      • Configuring Destination Details for Output Datasets
      • Connectors overview
      • Connect to external connectors
        • Importing data from Google Cloud Storage (GCS)
        • Importing data from Amazon S3
        • Importing data from Azure Blob
        • Importing data from Mongo DB
        • Importing data from Snowflake
        • Importing data from MySQL
        • Importing data from Amazon Redshift
        • Importing data from Fivetran connectors
    • Workspaces
      • User roles and permissions
      • Super Admin Management
    • Artifacts & Models
      • Adding Artifacts at the Project Level
      • Adding Models at the Project Level
      • Creating an artifact at the workspace level
      • Managing artifacts at the workspace level
      • Managing Models at the Workspace Level
      • Prediction services
    • Environments Overview
      • Creating an environment
      • Editing the environment details
      • Deleting an environment
      • Monitoring the resource utilization in an environment
  • ADVANCED
    • Starter Guide
      • Quick Start
    • Setup and Installation
      • Installing and setting up the SDK
    • Helper Functions
    • Notebook Guide
      • Introduction
      • Create a template
      • Code Snippets
      • DataApps
      • Prediction Service
      • How to
        • How to Authenticate
        • Create a new project
        • Create a Custom Environment
        • Add a dataset
        • Add a recipe to the dataset
        • Manage cloud connection
        • Code recipes
        • Display a template on the UI
        • Create Global Variables
        • Scheduler
        • Create new scenarios
        • Create Template
        • Use a template in a flow notebook
      • Reference Implementations
        • DataApps
        • Artifacts
        • Connectors
        • Feature Store
        • ML model
        • ML Pipeline
        • Multiple Files
      • Sample Projects
        • Model build and predict
    • Rapid Rag
    • MLflow
  • Additional Reading
    • Release Notes
      • June 09, 2025
      • May 14, 2025
      • April 21, 2025
      • April 01, 2025
      • Mar 18, 2025
      • Feb 27, 2025
      • Jan 27, 2025
      • Dec 26, 2024
      • Nov 26, 2024
      • Oct 24, 2024
      • Sep 11, 2024
        • Aug 08, 2024
      • Aug 29, 2024
      • July 18, 2024
      • July 03, 2024
      • June 19, 2024
      • May 30, 2024
      • May 15, 2024
      • April 17, 2024
      • Mar 28, 2024
      • Mar 20, 2024
      • Feb 28, 2024
      • Feb 19, 2024
      • Jan 30, 2024
      • Jan 16, 2024
      • Dec 12, 2023
      • Nov 07, 2023
      • Oct 25, 2023
      • Oct 01, 2024
    • Glossary
Powered by GitBook
On this page
  • Key Capabilities
  • Working with MLflow in RapidCanvas
  • How to Access MLflow in RapidCanvas
  • Requirements to Use MLflow
  • Get Started: Using MLflow in RapidCanvas
  1. ADVANCED

MLflow

MLflow Integration Overview

MLflow is an open-source platform designed to help data scientists track, manage, and compare machine learning experiments. It provides a structured approach to evaluating model performance, helping teams identify the best-performing models for their use case.

RapidCanvas supports MLflow directly within workspaces, making it easier than ever to integrate experiment tracking into your model development workflow.

Key Capabilities

Create Experiments Users can set up and manage multiple runs by experimenting with different algorithms, datasets, and parameters using code recipes. This allows for a wide range of modelling approaches to be tested within the same project.

Track Performance Metrics Each run can log important metrics such as accuracy, precision, recall, F1 score, and more. These metrics are tracked over time, enabling users to monitor how different models perform under various configurations.

Compare Runs Across Flows Users can view and compare the results of different runs side-by-side using Mlflow UI. This makes it easy to identify and select the best-performing model for a specific experiment.

Working with MLflow in RapidCanvas

MLflow is now integrated into RapidCanvas workspaces to help data scientists efficiently manage, track, and compare machine learning experiments. Each workspace in RapidCanvas operates independently, and MLflow instances are isolated per workspace.

You can interact with MLflow through code recipes by ensuring the necessary libraries are installed in the environment.

To work with MLflow, you’ll also need to import helper functions and initiate MLflow tracking within your code.

Why Use MLflow?

Building a machine learning model is rarely a one-step process. Data scientists often go through multiple iterations, experimenting with different algorithms, parameters, and feature combinations. Manually logging the outcomes of these experiments—like accuracy, precision, and F1 score—in spreadsheets is both tedious and error-prone.

MLflow simplifies this by logging key metrics and parameters for each model run. This not only saves time but also makes it easy to compare results across multiple experiments and choose the best-performing model.

How It Works – Example with Titanic Dataset

Imagine you're building models using the Titanic dataset. You might perform the following:

  • Run 1: Use a specific feature with a basic algorithm; log accuracy, precision, F1 score.

  • Run 2: Try a different algorithm with altered parameters (features); track the new metrics.

  • Run 3: Use another algorithm with different parameters again.

Each of these executions becomes a run in MLflow. When you open the MLflow UI, all the runs are listed with their respective metrics, allowing you to easily compare and determine which model performed best. If you are running hundreds of experiments, MLflow will keep track of every run—helping you analyse results at scale.

You can also write code in code recipe to programmatically compare runs and select the one with the best metric (e.g., highest accuracy). The selected run can then be used to generate the best model within RapidCanvas using the provided helper functions.

Try It Out

RapidCanvas provides sample MLflow syntax to help you get started quickly. To explore and test MLflow in your code recipes:

  1. Open any code recipe.

  2. Click on the Syntax option.

  3. Choose the sample MLflow scripts provided to understand how to structure your experiments.

Tip: Each workspace has an independent MLflow instance. All experiment tracking is isolated per workspace for security and isolation.

How to Access MLflow in RapidCanvas

To start using MLflow in RapidCanvas, simply navigate to the main menu of any workspace. From there, you’ll find the MLflow option readily available—just click to access and begin managing your machine learning experiments.

Requirements to Use MLflow

To run ML flows using code recipes in RapidCanvas, certain Python libraries must be installed in the environment where the flows are executed.

You can install these libraries in two ways:

  • At the environment level – making them available across all recipes using that environment.

  • At the recipe level – making them available only for a specific recipe.

Ensure that the necessary MLflow-related libraries are installed before executing any ML flow to avoid runtime issues.

mlflow-skinny==2.22.0 urllib3==1.26.18

Get Started: Using MLflow in RapidCanvas

Following these steps to begin using MLflow for tracking and comparing machine learning experiments within the RapidCanvas platform.

Step 1: Navigate to the Code Recipe from the Canvas

To begin writing code for ML experiments using different parameters (features) and algorithms, first open your project canvas. From there, select the Code Recipe option to access the coding interface where you can implement and run your MLflow experiments.

Step 2: Import Required functions

To use MLflow within your code recipes, make sure to import the necessary helper functions:

from utils.notebookhelpers.helpers import Helpers
from utils.notebookhelpers.mlflow_utils import MLflowUtils

Step 3: Initialize Helper functions

Set up and initialize the imported helper functions to enable interaction with MLflow. These functions help manage runs, log parameters and metrics, and streamline experiment tracking.

context = Helpers.getOrCreateContext(contextId='contextId', localVars=locals())
MLflowUtils.init(context)

Step 4: Use Sample Syntax to Experiment with ML Flows

Copy the sample MLflow syntax into the Code tab of your code recipe to start experimenting with multiple ML flows. This code helps you log metrics, compare runs, and evaluate different models efficiently.

Example Syntax
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
"""
Please ensure the following packages are installed in the environment either by checking the available internal packages or specifying them in the requirements.
seaborn mlflow-skinny==2.22.0 urllib3==1.26.18
"""
# Required imports

from utils.notebookhelpers.helpers import Helpers
from utils.notebookhelpers.mlflow_utils import MLflowUtils

context = Helpers.getOrCreateContext(contextId='contextId', localVars=locals())

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
MLflowUtils.init(context)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
import mlflow
import mlflow.sklearn
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
import numpy as np

# Load and prepare data
df = sns.load_dataset("titanic")[["survived", "sex", "age", "pclass", "fare"]].dropna() # Add more features
X = df.drop("survived", axis=1)
y = df["survived"].values

# Preprocessing for 'sex'
X['sex'] = X['sex'].map({'male': 1, 'female': 0})

# For simplicity in this demo, we'll just use 'sex' for the baseline
X_baseline = X[["sex"]].values.reshape(-1, 1)
y_baseline = y

X_train_base, X_test_base, y_train_base, y_test_base = train_test_split(X_baseline, y_baseline, test_size=0.3, random_state=42, stratify=y_baseline)

# Set experiment
mlflow.set_experiment("Titanic_Survival_Demo_Experiment")

# --- Run 1: Baseline Model (Sex Only) ---
with mlflow.start_run(run_name="Baseline_Sex_Only"):
    mlflow.log_param("features", "sex_only")
    mlflow.log_param("model_type", "LogisticRegression")
    mlflow.log_param("scaling", "None") # Explicitly state no scaling

    model_base = LogisticRegression(solver="liblinear", random_state=42)
    model_base.fit(X_train_base, y_train_base)

    y_pred_base = model_base.predict(X_test_base)
    y_pred_proba_base = model_base.predict_proba(X_test_base)[:, 1] # For ROC later

    # Log metrics
    mlflow.log_metric("accuracy", accuracy_score(y_test_base, y_pred_base))
    mlflow.log_metric("precision", precision_score(y_test_base, y_pred_base, zero_division=0))
    mlflow.log_metric("recall", recall_score(y_test_base, y_pred_base, zero_division=0))
    mlflow.log_metric("f1_score", f1_score(y_test_base, y_pred_base, zero_division=0))

    # Log model
    mlflow.sklearn.log_model(model_base, "model_sex_only")
    mlflow.set_tag("primary_model_path", "model_sex_only")

    print("Baseline run logged.")

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Prepare data with more features
X_full_df = X[['sex', 'age', 'pclass', 'fare']] # Using already processed X as DataFrame
y_full = y # y is already a NumPy array

# Using DataFrame for X_full_df to retain column names for feature importance plot later
X_train_full, X_test_full, y_train_full, y_test_full = train_test_split(
    X_full_df, y_full, test_size=0.3, random_state=42, stratify=y_full
)


# --- Run 2: Logistic Regression with More Features (No Scaling) ---
with mlflow.start_run(run_name="LogisticRegression_MoreFeatures_NoScale"):
    mlflow.log_param("features", "sex, age, pclass, fare")
    mlflow.log_param("model_type", "LogisticRegression")
    mlflow.log_param("scaling", "None") # Indicate no scaling is used

    # Model is trained on unscaled data
    model_full_lr = LogisticRegression(solver="liblinear", random_state=42, max_iter=200) # Added max_iter for convergence with unscaled data
    model_full_lr.fit(X_train_full, y_train_full) # Use unscaled X_train_full

    y_pred_full_lr = model_full_lr.predict(X_test_full) # Use unscaled X_test_full
    y_pred_proba_full_lr = model_full_lr.predict_proba(X_test_full)[:,1] # Use unscaled X_test_full

    # Log metrics
    mlflow.log_metric("accuracy", accuracy_score(y_test_full, y_pred_full_lr))
    mlflow.log_metric("precision", precision_score(y_test_full, y_pred_full_lr, zero_division=0))
    mlflow.log_metric("recall", recall_score(y_test_full, y_pred_full_lr, zero_division=0))
    mlflow.log_metric("f1_score", f1_score(y_test_full, y_pred_full_lr, zero_division=0))

    # Log model
    mlflow.sklearn.log_model(model_full_lr, "model_more_features_lr_noscale")
    mlflow.set_tag("primary_model_path", "model_more_features_lr_noscale")
    # No scaler to log

    # Log a ROC curve as an artifact
    from sklearn.metrics import roc_curve, auc # Moved import here as it's only used here now
    fpr, tpr, thresholds = roc_curve(y_test_full, y_pred_proba_full_lr)
    roc_auc = auc(fpr, tpr)
    plt.figure()
    plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:0.2f})')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC (LR More Features - No Scale)')
    plt.legend(loc="lower right")
    # plt.savefig("roc_curve_more_features_lr_noscale.png") # Lines for saving artifact remain commented
    # mlflow.log_artifact("roc_curve_more_features_lr_noscale.png")
    plt.close() 

    print("Logistic Regression with more features (no scaling) run logged.")

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from sklearn.ensemble import RandomForestClassifier

# --- Run 3: Random Forest with More Features (No Scaling) ---
with mlflow.start_run(run_name="RandomForest_MoreFeatures_NoScale"):
    mlflow.log_param("features", "sex, age, pclass, fare")
    mlflow.log_param("model_type", "RandomForestClassifier")
    mlflow.log_param("scaling", "None") # Indicate no scaling is used

    # Manually set parameters for Random Forest
    best_params = {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 4} # Example
    model_rf = RandomForestClassifier(**best_params, random_state=42)
    mlflow.log_params(best_params)

    # Model is trained on unscaled data (X_train_full, y_train_full from previous cell)
    model_rf.fit(X_train_full, y_train_full)

    y_pred_rf = model_rf.predict(X_test_full) # Use unscaled X_test_full
    y_pred_proba_rf = model_rf.predict_proba(X_test_full)[:,1] # Use unscaled X_test_full

    # Log metrics
    mlflow.log_metric("accuracy", accuracy_score(y_test_full, y_pred_rf))
    mlflow.log_metric("precision", precision_score(y_test_full, y_pred_rf, zero_division=0))
    mlflow.log_metric("recall", recall_score(y_test_full, y_pred_rf, zero_division=0))
    mlflow.log_metric("f1_score", f1_score(y_test_full, y_pred_rf, zero_division=0))

    # Log model
    mlflow.sklearn.log_model(model_rf, "model_random_forest_noscale")
    mlflow.set_tag("primary_model_path", "model_random_forest_noscale")

    # Log a feature importance plot for Random Forest
    if hasattr(model_rf, 'feature_importances_'):
        importances = model_rf.feature_importances_
        # feature_names = X_full.columns # X_full was original features before split
        feature_names = X_train_full.columns # Use columns from the DataFrame used for training
        sorted_indices = np.argsort(importances)[::-1]
        
        plt.figure(figsize=(10,6))
        plt.title("Feature Importances (Random Forest - No Scale)")
        # Use X_train_full.shape[1] for number of features
        plt.bar(range(X_train_full.shape[1]), importances[sorted_indices], align="center")
        plt.xticks(range(X_train_full.shape[1]), feature_names[sorted_indices], rotation=90)
        plt.tight_layout()
        # plt.savefig("feature_importances_rf_noscale.png") # Lines for saving artifact remain commented
        # mlflow.log_artifact("feature_importances_rf_noscale.png")
        plt.close()

    print("Random Forest with more features (no scaling) run logged.")
import mlflow
# mlflow.sklearn is already imported, but good to ensure for standalone cell
import mlflow.sklearn 
import time 

# Name for the registered model in the Model Registry
model_name_reg = "TitanicSurvivalPredictor_DS_Demo"
client = mlflow.tracking.MlflowClient()

# Define criteria for selecting the model
source_experiment_name = "Titanic_Survival_Demo_Experiment" 
selection_metric_name = "accuracy" 

print(f"Searching for the run with best '{selection_metric_name}' in experiment '{source_experiment_name}'...")

try:
    experiment = client.get_experiment_by_name(source_experiment_name)

    if experiment:
        runs = client.search_runs(
            experiment_ids=experiment.experiment_id,
            order_by=[f"metrics.{selection_metric_name} DESC"], 
            max_results=1 
        )

        if runs:
            best_run_to_register = runs[0]
            best_run_id = best_run_to_register.info.run_id
            best_run_metric_value = best_run_to_register.data.metrics.get(selection_metric_name)
            source_run_name_tag = best_run_to_register.data.tags.get('mlflow.runName', 'N/A')
            model_artifact_path_in_source_run = best_run_to_register.data.tags.get("primary_model_path")

            if not model_artifact_path_in_source_run:
                print(f"Error: Run '{source_run_name_tag}' (ID: {best_run_id}) does not have the 'primary_model_path' tag. Cannot register model.")
            else:
                model_uri_for_registration = f"runs:/{best_run_id}/{model_artifact_path_in_source_run}"

                print(f"Found best model from run: '{source_run_name_tag}' (ID: {best_run_id}) with {selection_metric_name}: {best_run_metric_value}")
                print(f"Attempting to register model from URI: {model_uri_for_registration} as '{model_name_reg}'")

                try:
                    registered_model_version = mlflow.register_model(
                        model_uri=model_uri_for_registration,
                        name=model_name_reg,
                        tags={
                            "source_run_id": best_run_id,
                            "selection_metric": f"{selection_metric_name}_{best_run_metric_value}"
                        }
                    )
                    print(f"Successfully registered model '{model_name_reg}', version: {registered_model_version.version}")
                    print(f"Current stage: {registered_model_version.current_stage}")

                    version_description = (
                        f"Model selected from run '{source_run_name_tag}' (ID: {best_run_id}).
"
                        f"Achieved {selection_metric_name}: {best_run_metric_value}.
"
                        f"Features: {best_run_to_register.data.params.get('features', 'N/A')}.
"
                        f"Model Type: {best_run_to_register.data.params.get('model_type', 'N/A')}.
"
                        f"Scaling: {best_run_to_register.data.params.get('scaling', 'N/A')}." # This will now show 'None'
                    )
                    client.update_model_version(
                        name=model_name_reg,
                        version=registered_model_version.version,
                        description=version_description
                    )
                    print(f"Description updated for model version {registered_model_version.version}.")

                    client.set_model_version_tag(name=model_name_reg, version=registered_model_version.version, key="validation_status", value="candidate")
                    client.set_model_version_tag(name=model_name_reg, version=registered_model_version.version, key="data_snapshot_id", value="titanic_20250530") # Example date
                    client.set_model_version_tag(name=model_name_reg, version=registered_model_version.version, key="picked_by", value="best_accuracy_script")
                    print(f"Tags added to model version {registered_model_version.version}.")

                    alias_name = "best-candidate" 
                    client.set_registered_model_alias(
                        name=model_name_reg,
                        alias=alias_name,
                        version=registered_model_version.version
                    )
                    print(f"Alias '{alias_name}' set for model version {registered_model_version.version}.")

                    client.set_registered_model_tag(name=model_name_reg, key="project", value="Titanic Survival Demo")
                    client.set_registered_model_tag(name=model_name_reg, key="domain", value="Passenger Classification")
                    client.set_registered_model_tag(name=model_name_reg, key="model_goal", value="Predict survival")
                    print(f"Tags added to registered model '{model_name_reg}'.")

                except mlflow.exceptions.MlflowException as e:
                    print(f"Error during model registration or update: {e}")
        else:
            print(f"No runs found in experiment '{source_experiment_name}'. Model registration skipped.")
    else:
        print(f"Experiment '{source_experiment_name}' not found. Model registration skipped.")

except Exception as e:
    import traceback
    print(f"An unexpected error occurred in model registration cell: {e}")
    traceback.print_exc()
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
Helpers.save_output_mlflow_model(context=context, model_name=model_name_reg)

Step 5: View Results in the MLflow UI

After running the recipe, the experiment runs will be logged and displayed in the MLflow UI. You can open the MLflow UI to view details of each run, including parameters, metrics (like accuracy, precision, and F1 score), and artifacts. This allows you to compare results across runs and identify which model performs best.

Step 6: Pick the Best Model

After analysing the results of various experiment runs in the MLflow UI, identify the run that produced the best performance metrics (e.g., highest accuracy or F1 score).

You can then reference the run ID or parameters from this best-performing run in your code recipe to regenerate or finalize the model within the platform. This allows you to consistently reproduce and use the optimal model for deployment or further analysis.

This following function allows you to log and save the trained model as part of the MLflow experiment tracking

Helpers.save_output_mlflow_model(context=context, model_name=model_name_reg)
PreviousRapid RagNextRelease Notes

Last updated 2 hours ago

If you’re new to MLflow, you can learn more by reading the official .

MLflow documentation