Helper Functions
This page provides a list of all helper functions supported within RapidCanvas
Import Helper Functions
from utils.notebookhelpers.helpers import Helpers
Context Management
getOrCreateContext
@staticmethod
def getOrCreateContext(contextId, localVars, entities=None):
Purpose: Creates or retrieves a context object which is essential for most helper functions. This is typically the first function called in a notebook.
Parameters:
contextId
: String identifier for the context (used for local testing)localVars
: Local variables in the current environment (passlocals()
)entities
: Optional dictionary of local datasets with entity name as key and file path as value
Returns: A context dictionary that will be used by other helper functions
Example:
# Initialize context at the beginning of your notebook
context = Helpers.getOrCreateContext(
contextId="my_test_context",
localVars=locals(),
entities={
"customers": "/path/to/customers.csv",
"orders": "/path/to/orders.csv"
}
)
How it works:
Checks if
internalContext
already exists in local variablesFor production environments, returns an empty context
For local execution, it either:
Creates a new context with entity paths if provided
Retrieves previously prepared context data
Notes:
This function must be called at the beginning of your notebook
The context object is used by most other helper functions
For local testing, you can specify local file paths for entities
The context now includes additional fields such as:
vector_stores
: List of vector stores for embedding-based retrievalfiles_data
: List of file paths and metadataglobal_vars
: Dictionary of global variables
Parameter and Variable Management
addParam
@staticmethod
def addParam(context, paramKey, paramVal):
Purpose: Adds a parameter to the context dictionary.
Parameters:
context
: The context dictionaryparamKey
: Key name for the parameterparamVal
: Value of the parameter
Example:
# Store a configuration value in the context
Helpers.addParam(context, "max_iterations", 100)
getParam
@staticmethod
def getParam(context, param):
Purpose: Retrieves a parameter value from the context by its key.
Parameters:
context
: The context dictionaryparam
: Key name of the parameter to retrieve
Returns: The value of the parameter, or None if the parameter doesn't exist
Example:
# Get a configuration value from the context
max_iterations = Helpers.getParam(context, "max_iterations")
if max_iterations is None:
max_iterations = 50 # Default value if not found
getAllParams
@staticmethod
def getAllParams(context):
Purpose: Gets a list of all parameter keys available in the context.
Parameters:
context
: The context dictionary
Returns: List of all parameter keys
Example:
# List all available parameters in the context
all_params = Helpers.getAllParams(context)
print("Available parameters:", all_params)
get_global_var
@staticmethod
def get_global_var(context, key):
Purpose: Retrieves a global variable from the context by key.
Parameters:
context
: The context dictionarykey
: Key name of the global variable to retrieve
Returns: The value of the global variable, or None if not found
Example:
# Get a global variable from the context
environment = Helpers.get_global_var(context, "environment")
if environment == "production":
# Apply production-specific logic
pass
get_secret
@staticmethod
def get_secret(context: dict, key: str):
Purpose: Securely retrieves and decrypts secret values from the context.
Parameters:
context
: The context dictionarykey
: Key name of the secret to retrieve
Returns: The decrypted secret value
Raises: Exception if the secret is not found
Example:
# Get a secret API key
try:
api_key = Helpers.get_secret(context, "API_KEY")
# Use the API key for authentication
except Exception as e:
print(f"Error retrieving secret: {e}")
How it works:
Checks if the secret exists in the context
If it's a string, decodes the base64 encoding
Decrypts the value using AES encryption with environment keys
Removes control characters from the result
If not found in context, tries to get from environment variables
Raises an exception if the secret is not found anywhere
Notes:
This function handles decryption of sensitive information
Secrets are stored encrypted in the context for security
Entity Data Management
getAllEntities
@staticmethod
def getAllEntities(context):
Purpose: Returns a list of all entity names available in the context.
Parameters:
context
: The context dictionary
Returns: List of entity names
Example:
# Get all available entities
entities = Helpers.getAllEntities(context)
print("Available entities:", entities)
getEntityData
@staticmethod
def getEntityData(context, entityName, inferDTypesFromSchema=False, numRows=None, pandas_lib=None):
Purpose: Loads an entity (dataset) as a pandas DataFrame.
Parameters:
context
: The context dictionaryentityName
: Name of the entity to loadinferDTypesFromSchema
: If True, will use schema to set data typesnumRows
: Optional limit on number of rows to readpandas_lib
: Optional pandas library to use (defaults to standard pandas)
Returns: DataFrame containing the entity data
Raises: Exception if the entity is not found
Example:
# Load customer data
try:
customers_df = Helpers.getEntityData(context, "customers")
# Load just the first 100 rows
sample_df = Helpers.getEntityData(context, "customers", numRows=100)
# Process the data
customers_df['full_name'] = customers_df['first_name'] + ' ' + customers_df['last_name']
except Exception as e:
print(f"Error loading entity: {e}")
How it works:
Looks up the entity in the context's entity paths
Gets the schema if needed
Downloads the entity file if necessary
Reads the data into a DataFrame
Raises an informative exception if the entity doesn't exist
load_all_entities
@staticmethod
def load_all_entities(context):
Purpose: Loads all entities into a dictionary of DataFrames.
Parameters:
context
: The context dictionary
Returns: Dictionary with entity names as keys and DataFrames as values
Example:
# Load all available entities as DataFrames
entities_dict = Helpers.load_all_entities(context)
# Access individual DataFrames
customers_df = entities_dict['customers']
orders_df = entities_dict['orders']
# Process multiple datasets
for entity_name, df in entities_dict.items():
print(f"Entity: {entity_name}, Rows: {len(df)}")
getEntityFilePath
@staticmethod
def getEntityFilePath(context, entityName):
Purpose: Gets the file path for a specific entity.
Parameters:
context
: The context dictionaryentityName
: Name of the entity
Returns: File path string, or None if entity doesn't exist
Example:
# Get the path to the customers entity file
entity_path = Helpers.getEntityFilePath(context, "customers")
if entity_path:
print(f"Entity file is located at: {entity_path}")
else:
print("Entity not found")
get_data_from_source
@staticmethod
def get_data_from_source(source_type: DataSourceType, source: str, name: str = None, **options):
Purpose: Retrieves data from various data sources.
Parameters:
source_type
: Type of data source (enum from DataSourceType)source
: Path or identifier for the data sourcename
: Optional name for the data source**options
: Additional options for the data source
Returns: Data from the source (typically a DataFrame)
Example:
from utils.rcclient.enums import DataSourceType
# Read data from a CSV file
csv_data = Helpers.get_data_from_source(
source_type=DataSourceType.CSV,
source="path/to/file.csv"
)
# Read data from a database
db_data = Helpers.get_data_from_source(
source_type=DataSourceType.SQL,
source="SELECT * FROM customers",
connection_string="postgresql://user:pass@localhost/dbname"
)
write_data_to_source
@staticmethod
def write_data_to_source(df, source_type: DataSourceType, target: str, name: str = None, **options):
Purpose: Writes data to various data destinations.
Parameters:
df
: DataFrame to writesource_type
: Type of data source (enum from DataSourceType)target
: Path or identifier for the data destinationname
: Optional name for the data source**options
: Additional options for the data source
Example:
from utils.rcclient.enums import DataSourceType
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'id': [1, 2, 3], 'value': ['a', 'b', 'c']})
# Write to a CSV file
Helpers.write_data_to_source(
df=df,
source_type=DataSourceType.CSV,
target="path/to/output.csv"
)
# Write to a database table
Helpers.write_data_to_source(
df=df,
source_type=DataSourceType.SQL,
target="my_table",
connection_string="postgresql://user:pass@localhost/dbname",
if_exists="replace" # Options: fail, replace, append
)
Artifact Management
getOrCreateArtifactsDir
@staticmethod
def getOrCreateArtifactsDir(context, artifactsId, purgeOld=False):
Purpose: Creates or retrieves a directory for storing artifacts (models, files, etc.)
Parameters:
context
: The context dictionaryartifactsId
: Identifier for the artifacts collection (auto-generated if None)purgeOld
: If True, will create a fresh directory even if it exists
Returns: Path to the artifacts directory
Example:
# Create an artifacts directory
artifacts_dir = Helpers.getOrCreateArtifactsDir(context, "my_model_files")
# Save a model file to the artifacts directory
model_path = os.path.join(artifacts_dir, "model.pkl")
with open(model_path, "wb") as f:
pickle.dump(model, f)
print(f"Artifacts directory: {artifacts_dir}")
print(f"Use artifacts ID 'my_model_files' to download these artifacts later")
How it works:
If no artifactsId is provided, generates a random UUID
Adds the artifactsId to the context for tracking
Constructs the local path for the artifacts directory
Either creates a fresh directory or downloads existing artifacts
Returns the path to the artifacts directory
Notes:
Artifacts are automatically uploaded when you call
Helpers.save(context)
Use the same artifactsId to access the same artifacts in different sessions
downloadArtifacts
@staticmethod
def downloadArtifacts(context, artifactsId):
Purpose: Downloads artifacts from storage to the local filesystem.
Parameters:
context
: The context dictionaryartifactsId
: ID of the artifacts to download
Returns: Dictionary of file names and their paths
Example:
# Download model artifacts by ID
files_dict = Helpers.downloadArtifacts(context, "my_model_files")
# Access specific files
model_path = files_dict.get("model.pkl")
if model_path:
with open(model_path, "rb") as f:
model = pickle.load(f)
list_artifact_files
@staticmethod
def list_artifact_files(context, artifacts_id):
Purpose: Lists files in an artifact without downloading them.
Parameters:
context
: The context dictionaryartifacts_id
: ID of the artifacts to list
Returns: List of file names in the artifact
Example:
# See what files are available in an artifact
file_list = Helpers.list_artifact_files(context, "my_model_files")
print("Available files:", file_list)
# Check if a specific file exists
if "model.pkl" in file_list:
# Download only that file
model_path = Helpers.download_artifact_file(context, "my_model_files", "model.pkl")
download_artifact_file
@staticmethod
def download_artifact_file(context, artifacts_id, file_name):
Purpose: Downloads a single file from an artifact.
Parameters:
context
: The context dictionaryartifacts_id
: ID of the artifact containing the filefile_name
: Name of the file to download
Returns: Path to the downloaded file
Example:
# Download just the model file from an artifact
model_path = Helpers.download_artifact_file(context, "my_model_files", "model.pkl")
# Load the model
with open(model_path, "rb") as f:
model = pickle.load(f)
get_artifact
@staticmethod
def get_artifact(context, artifact_id, artifact_name):
Purpose: Gets the path to a specific artifact file, downloading it if necessary.
Parameters:
context
: The context dictionaryartifact_id
: ID of the artifactartifact_name
: Name of the specific file in the artifact
Returns: Path to the artifact file
Example:
# Get a specific file from an artifact
config_path = Helpers.get_artifact(context, "project_files", "config.json")
# Read the config file
with open(config_path, "r") as f:
config = json.load(f)
uploadArtifacts
@staticmethod
def uploadArtifacts(context):
Purpose: Uploads all artifacts in the context to remote storage.
Parameters:
context
: The context dictionary
Example:
# Upload all artifacts registered in the context
Helpers.uploadArtifacts(context)
Notes:
This is automatically called by
Helpers.save(context)
Only uploads artifacts with IDs listed in the context
Output Management
createOutputCollection
@staticmethod
def createOutputCollection(context):
Purpose: Creates or retrieves a template output collection for storing outputs.
Parameters:
context
: The context dictionary
Returns: TemplateOutputCollection object
Example:
# Create or get the output collection
output_collection = Helpers.createOutputCollection(context)
Notes:
Most output creation functions add to this collection automatically
This is called internally by other output functions
getOutputCollection
@staticmethod
def getOutputCollection(context):
Purpose: Gets the existing template output collection from the context.
Parameters:
context
: The context dictionary
Returns: TemplateOutputCollection object or None if not found
Example:
# Get the output collection
output_collection = Helpers.getOutputCollection(context)
if output_collection:
# Check how many outputs are in the collection
print(f"Number of outputs: {len(output_collection.templateOutputs)}")
createTemplateOutput
@staticmethod
def createTemplateOutput(context, outputName: str, outputType: OutputType, data=None,
dataType: FileType = FileType.CSV, outputFileName: str = None,
custom_params: dict = {}, metadata: dict = {},
description: str = None, group: str = None):
Purpose: Creates a template output of any type.
Parameters:
context
: The context dictionaryoutputName
: Name of the outputoutputType
: Type of output from OutputType enumdata
: Data for the output (typically DataFrame)dataType
: Type of data file from FileType enum (CSV, PARQUET, etc.)outputFileName
: Optional custom filenamecustom_params
: Additional parameters for the outputmetadata
: Metadata for the outputdescription
: Optional descriptiongroup
: Optional group name for organizing outputs
Returns: TemplateOutput object
Example:
from utils.dtos.templateOutput import OutputType, FileType
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'id': [1, 2, 3], 'value': ['a', 'b', 'c']})
# Create a CSV output
output = Helpers.createTemplateOutput(
context=context,
outputName="my_output",
outputType=OutputType.ENTITY,
data=df,
dataType=FileType.CSV,
description="Sample data output",
group="Data Outputs"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(output)
Notes:
This is a general-purpose output creation function
Specialized versions exist for specific output types
Automatically handles file generation and storage
createTemplateOutputDataset
@staticmethod
def createTemplateOutputDataset(context, outputName, dataFrame):
Purpose: Creates a dataset (entity) output from a DataFrame.
Parameters:
context
: The context dictionaryoutputName
: Name of the datasetdataFrame
: Pandas DataFrame with the data
Returns: TemplateOutput object
Example:
import pandas as pd
# Create a processed dataset
processed_df = pd.DataFrame({'id': [1, 2, 3], 'processed_value': [10, 20, 30]})
# Create a dataset output
dataset_output = Helpers.createTemplateOutputDataset(
context=context,
outputName="processed_data",
dataFrame=processed_df
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(dataset_output)
Notes:
This is a convenience wrapper around createTemplateOutput
Automatically sets outputType to ENTITY and dataType to PARQUET
create_template_output_file
@staticmethod
def create_template_output_file(context, output_name, file_contents, file_type: FileType = FileType.TEXT):
Purpose: Creates a file output with the specified content.
Parameters:
context
: The context dictionaryoutput_name
: Name of the outputfile_contents
: String content of the filefile_type
: Type of file from FileType enum (TEXT, JSON, MARKDOWN, HTML). Defaults to TEXT if not specified.
Returns: TemplateOutput object
Example:
from utils.dtos.templateOutput import FileType
# Create a JSON configuration file
config = {
"model_params": {
"n_estimators": 100,
"max_depth": 5
}
}
json_content = json.dumps(config, indent=2)
# Create a file output
file_output = Helpers.create_template_output_file(
context=context,
output_name="model_config",
file_contents=json_content,
file_type=FileType.JSON
)
# Create a text file output with default file type
text_output = Helpers.create_template_output_file(
context=context,
output_name="readme",
file_contents="This is a simple text file."
# file_type parameter omitted, defaults to FileType.TEXT
)
# Add the outputs to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(file_output)
output_collection.addOutput(text_output)
create_template_output_artifact
@staticmethod
def create_template_output_artifact(context, artifact_name):
Purpose: Creates an artifact output for storing files.
Parameters:
context
: The context dictionaryartifact_name
: Name of the artifact (should match an artifactsId)
Returns: TemplateOutput object
Example:
# First create an artifacts directory
artifacts_dir = Helpers.getOrCreateArtifactsDir(context, "model_files")
# Save model files to the directory
model_path = os.path.join(artifacts_dir, "model.pkl")
with open(model_path, "wb") as f:
pickle.dump(model, f)
# Create an artifact output
artifact_output = Helpers.create_template_output_artifact(
context=context,
artifact_name="model_files"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(artifact_output)
Notes:
The artifact_name should match the artifactsId used with getOrCreateArtifactsDir
This registers the artifact for proper handling when the context is saved
Chart Output Functions
create_template_output_chart
@staticmethod
def create_template_output_chart(context, title, metadata={}, description=None, group=None):
Purpose: Creates a base chart output without specifying the chart type or data.
Parameters:
context
: The context dictionarytitle
: Title of the chartmetadata
: Optional metadata for the chartdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: TemplateOutput object
Example:
# Create a base chart output
chart_output = Helpers.create_template_output_chart(
context=context,
title="Data Distribution",
description="Distribution of values across categories",
group="Analysis Charts"
)
Notes:
This creates a basic chart output container
You'll typically use one of the more specific chart functions instead
createTemplateOutputEChart
@staticmethod
def createTemplateOutputEChart(context, chartTitle, dataFrame, chartType=ChartType.TABLE,
params={}, description=None, group=None):
Purpose: Creates a chart output using ECharts visualization library.
Parameters:
context
: The context dictionarychartTitle
: Title of the chartdataFrame
: Pandas DataFrame with the data to visualizechartType
: Type of chart from ChartType enum (default is TABLE)params
: Optional parameters for chart configurationdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: TemplateOutput object
Example:
from utils.dtos.templateOutput import ChartType
import pandas as pd
# Create a DataFrame with sample data
data = {
'category': ['A', 'B', 'C', 'D'],
'value': [10, 25, 15, 30]
}
df = pd.DataFrame(data)
# Create a bar chart
bar_chart = Helpers.createTemplateOutputEChart(
context=context,
chartTitle="Category Distribution",
dataFrame=df,
chartType=ChartType.BAR,
description="Distribution of values by category",
group="Analysis Charts"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(bar_chart)
Notes:
Uses ECharts library for interactive visualizations
Common chart types include BAR, LINE, PIE, SCATTER, TABLE
createTemplateOutputPlotlibChart
@staticmethod
def createTemplateOutputPlotlibChart(context, chartTitle: str, plt, description=None, group=None):
Purpose: Creates a chart output from a matplotlib plot.
Parameters:
context
: The context dictionarychartTitle
: Title of the chartplt
: Matplotlib pyplot object with the plotdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: TemplateOutput object
Example:
import matplotlib.pyplot as plt
import numpy as np
# Create a matplotlib plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title("Sine Wave")
plt.xlabel("X")
plt.ylabel("sin(x)")
plt.grid(True)
# Create a chart output from the matplotlib plot
sine_chart = Helpers.createTemplateOutputPlotlibChart(
context=context,
chartTitle="Sine Wave Plot",
plt=plt,
description="A plot of the sine function",
group="Mathematical Functions"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(sine_chart)
Notes:
Saves the matplotlib figure as a PNG file
The plot is automatically saved to the artifacts directory
createTemplateOutputPlotlyChart
@staticmethod
def createTemplateOutputPlotlyChart(context, chartTitle: str, plotly_fig, description=None, group=None):
Purpose: Creates a chart output from a Plotly figure.
Parameters:
context
: The context dictionarychartTitle
: Title of the chartplotly_fig
: Plotly figure objectdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: TemplateOutput object
Example:
import plotly.express as px
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'x': range(10),
'y': [i**2 for i in range(10)],
'category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
})
# Create a Plotly figure
fig = px.scatter(df, x='x', y='y', color='category',
title='Quadratic Function by Category')
# Create a chart output from the Plotly figure
scatter_chart = Helpers.createTemplateOutputPlotlyChart(
context=context,
chartTitle="Quadratic Scatter Plot",
plotly_fig=fig,
description="A scatter plot showing x² by category",
group="Mathematical Functions"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(scatter_chart)
Notes:
Saves the Plotly figure as an HTML file with interactive features
Uses CDN for Plotly JavaScript libraries to keep file size small
createTemplateOutputPlotlyChartAsJson
@staticmethod
def createTemplateOutputPlotlyChartAsJson(context, chartTitle: str, plotly_fig, group=None):
Purpose: Creates a chart output from a Plotly figure, saving it as JSON.
Parameters:
context
: The context dictionarychartTitle
: Title of the chartplotly_fig
: Plotly figure objectgroup
: Optional group name for organizing outputs
Returns: TemplateOutput object
Example:
import plotly.express as px
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
'sales': [200, 150, 300, 250, 400]
})
# Create a Plotly figure
fig = px.bar(df, x='month', y='sales', title='Monthly Sales')
# Create a chart output from the Plotly figure as JSON
json_chart = Helpers.createTemplateOutputPlotlyChartAsJson(
context=context,
chartTitle="Monthly Sales Chart",
plotly_fig=fig,
group="Business Analytics"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(json_chart)
Notes:
Saves the Plotly figure as a JSON file
This format is useful when you need to load and modify the chart later
Model and Metadata Functions
create_template_output_rc_ml_model
@staticmethod
def create_template_output_rc_ml_model(context, model_name, model_obj, artifacts, version="default"):
Purpose: Creates an output for a machine learning model.
Parameters:
context
: The context dictionarymodel_name
: Name of the modelmodel_obj
: Model object (must extend RCMLModel class)artifacts
: Dictionary of model artifactsversion
: Optional version string for the model
Returns: TemplateOutput object
Raises: Exception if model_obj doesn't extend RCMLModel
Example:
from utils.dtos.rc_ml_model import RCMLModel
# Define a custom model class
class MyClassifier(RCMLModel):
def __init__(self):
super().__init__()
self.model = None
def fit(self, X, y):
# Train the model
from sklearn.ensemble import RandomForestClassifier
self.model = RandomForestClassifier()
self.model.fit(X, y)
def predict(self, X):
# Make predictions
return self.model.predict(X)
# Create and train the model
model = MyClassifier()
model.fit(X_train, y_train)
# Save model artifacts
artifacts_dir = Helpers.getOrCreateArtifactsDir(context, "model_artifacts")
artifacts = {
"model_info": "Random Forest Classifier with default parameters"
}
# Create model output
model_output = Helpers.create_template_output_rc_ml_model(
context=context,
model_name="customer_classifier",
model_obj=MyClassifier,
artifacts="model_artifacts",
version="v1.0"
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(model_output)
Notes:
The model class must extend RCMLModel
Artifacts should contain any files needed for the model to function
get_rc_ml_model
@staticmethod
def get_rc_ml_model(context, model_name, model_version="default"):
Purpose: Loads a previously saved ML model.
Parameters:
context
: The context dictionarymodel_name
: Name of the model to loadmodel_version
: Version of the model (default is "default")
Returns: Loaded model object
Example:
# Load a saved model
model = Helpers.get_rc_ml_model(
context=context,
model_name="customer_classifier",
model_version="v1.0"
)
# Use the model to make predictions
predictions = model.predict(X_test)
create_template_output_metadata
@staticmethod
def create_template_output_metadata(context, metadata_list):
Purpose: Creates an output for metadata information.
Parameters:
context
: The context dictionarymetadata_list
: List of metadata objects
Returns: TemplateOutput object
Example:
from utils.dtos.metadata import Metadata, MetadataSubjectType
# Create metadata objects
metadata1 = Metadata(
subject_type=MetadataSubjectType.DATASET,
subject_name="customers",
key="row_count",
value="5000"
)
metadata2 = Metadata(
subject_type=MetadataSubjectType.MODEL,
subject_name="customer_classifier",
key="accuracy",
value="0.92"
)
# Create metadata output
metadata_output = Helpers.create_template_output_metadata(
context=context,
metadata_list=[metadata1, metadata2]
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(metadata_output)
get_metadata_value
@staticmethod
def get_metadata_value(context, subject_type: MetadataSubjectType, subject_name: str, key: str):
Purpose: Retrieves a specific metadata value from the context.
Parameters:
context
: The context dictionarysubject_type
: Type of subject from MetadataSubjectType enumsubject_name
: Name of the subjectkey
: Metadata key to retrieve
Returns: The metadata value, or None if not found
Example:
from utils.dtos.metadata import MetadataSubjectType
# Get metadata value
accuracy = Helpers.get_metadata_value(
context=context,
subject_type=MetadataSubjectType.MODEL,
subject_name="customer_classifier",
key="accuracy"
)
print(f"Model accuracy: {accuracy}")
get_all_metadata
@staticmethod
def get_all_metadata(context):
Purpose: Retrieves all metadata from the context.
Parameters:
context
: The context dictionary
Returns: Dictionary of all metadata
Example:
# Get all metadata
all_metadata = Helpers.get_all_metadata(context)
# Print metadata information
for metadata in all_metadata:
print(f"Subject: {metadata['subject_name']}, Key: {metadata['key']}, Value: {metadata['value']}")
create_template_output_answer
@staticmethod
def create_template_output_answer(context, answer):
Purpose: Creates an output containing an answer or result.
Parameters:
context
: The context dictionaryanswer
: The answer text or object
Returns: TemplateOutput object
Example:
# Analyze data and create an answer
analysis_result = "Based on the data analysis, customer segment A shows the highest retention rate at 87%."
# Create answer output
answer_output = Helpers.create_template_output_answer(
context=context,
answer=analysis_result
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(answer_output)
create_template_output_vector_store
@staticmethod
def create_template_output_vector_store(context, vector_store_obj: VectorStoreBase):
Purpose: Creates an output for a vector store (used for embeddings and similarity search).
Parameters:
context
: The context dictionaryvector_store_obj
: Vector store object (must extend VectorStoreBase)
Returns: TemplateOutput object
Example:
from utils.libutils.vectorStores.faiss import FaissVectorStore
import numpy as np
# Create a vector store
vector_store = FaissVectorStore(name="document_embeddings")
# Add documents with their embeddings
documents = ["This is document 1", "This is document 2", "This is document 3"]
embeddings = np.random.rand(3, 128) # Simulated embeddings
for i, (doc, embedding) in enumerate(zip(documents, embeddings)):
vector_store.add_item(i, doc, embedding)
# Create vector store output
vs_output = Helpers.create_template_output_vector_store(
context=context,
vector_store_obj=vector_store
)
# Add it to the collection
output_collection = Helpers.getOutputCollection(context)
output_collection.addOutput(vs_output)
Convenience Save Functions
These functions combine creating an output and saving it to the context in a single step.
save_output_dataset
@staticmethod
def save_output_dataset(context, output_name, data_frame) -> TemplateOutput:
Purpose: Creates a dataset output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionaryoutput_name
: Name of the datasetdata_frame
: Pandas DataFrame with the data
Returns: The created TemplateOutput object
Example:
import pandas as pd
# Create a processed dataset
processed_df = pd.DataFrame({'id': [1, 2, 3], 'processed_value': [10, 20, 30]})
# Create, add to collection, and save the dataset output in one step
Helpers.save_output_dataset(context, "processed_data", processed_df)
Notes:
Automatically adds the output to the template output collection
Calls
Helpers.save(context)
for youReturns the TemplateOutput object for reference if needed
save_output_plotly_chart_as_json
@staticmethod
def save_output_plotly_chart_as_json(context, chart_title, plotly_fig, group=None) -> TemplateOutput:
Purpose: Creates a Plotly chart output as JSON, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionarychart_title
: Title of the chartplotly_fig
: Plotly figure objectgroup
: Optional group name for organizing outputs
Returns: The created TemplateOutput object
Example:
import plotly.express as px
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
'sales': [200, 150, 300, 250, 400]
})
# Create a Plotly figure
fig = px.bar(df, x='month', y='sales', title='Monthly Sales')
# Create, add to collection, and save the chart output in one step
Helpers.save_output_plotly_chart_as_json(context, "Monthly Sales Chart", fig, group="Business Analytics")
save_output_plot_lib_chart
@staticmethod
def save_output_plot_lib_chart(context, chart_title, plt, description=None, group=None) -> TemplateOutput:
Purpose: Creates a matplotlib chart output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionarychart_title
: Title of the chartplt
: Matplotlib pyplot object with the plotdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: The created TemplateOutput object
Example:
import matplotlib.pyplot as plt
import numpy as np
# Create a matplotlib plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title("Sine Wave")
plt.xlabel("X")
plt.ylabel("sin(x)")
plt.grid(True)
# Create, add to collection, and save the chart output in one step
Helpers.save_output_plot_lib_chart(context, "Sine Wave Plot", plt,
description="A plot of the sine function",
group="Mathematical Functions")
save_output_echart
@staticmethod
def save_output_echart(context, chart_title, data_frame, chart_type: ChartType = ChartType.TABLE,
params=None, description=None, group=None) -> TemplateOutput:
Purpose: Creates an ECharts visualization, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionarychart_title
: Title of the chartdata_frame
: Pandas DataFrame with the data to visualizechart_type
: Type of chart from ChartType enum (default is TABLE)params
: Optional parameters for chart configurationdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: The created TemplateOutput object
Example:
from utils.dtos.templateOutput import ChartType
import pandas as pd
# Create a DataFrame with sample data
data = {
'category': ['A', 'B', 'C', 'D'],
'value': [10, 25, 15, 30]
}
df = pd.DataFrame(data)
# Create, add to collection, and save the chart output in one step
Helpers.save_output_echart(context, "Category Distribution", df, ChartType.BAR,
description="Distribution of values by category",
group="Analysis Charts")
save_output_chart
@staticmethod
def save_output_chart(context, title, metadata=None, description=None, group=None) -> TemplateOutput:
Purpose: Creates a base chart output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionarytitle
: Title of the chartmetadata
: Optional metadata for the chartdescription
: Optional description of the chartgroup
: Optional group name for organizing outputs
Returns: The created TemplateOutput object
Example:
# Create, add to collection, and save a base chart output in one step
Helpers.save_output_chart(context, "Data Distribution",
description="Distribution of values across categories",
group="Analysis Charts")
save_output_rc_ml_model
@staticmethod
def save_output_rc_ml_model(context, model_name, model_obj, artifacts, version="default") -> TemplateOutput:
Purpose: Creates a machine learning model output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionarymodel_name
: Name of the modelmodel_obj
: Model object (must extend RCMLModel class)artifacts
: Dictionary of model artifactsversion
: Optional version string for the model
Returns: The created TemplateOutput object
Example:
from utils.dtos.rc_ml_model import RCMLModel
# Define a custom model class
class MyClassifier(RCMLModel):
def __init__(self):
super().__init__()
self.model = None
def fit(self, X, y):
# Train the model
from sklearn.ensemble import RandomForestClassifier
self.model = RandomForestClassifier()
self.model.fit(X, y)
def predict(self, X):
# Make predictions
return self.model.predict(X)
# Create and train the model
model = MyClassifier()
model.fit(X_train, y_train)
# Save model artifacts
artifacts_dir = Helpers.getOrCreateArtifactsDir(context, "model_artifacts")
artifacts = {
"model_info": "Random Forest Classifier with default parameters"
}
# Create, add to collection, and save the model output in one step
Helpers.save_output_rc_ml_model(context, "customer_classifier", MyClassifier, "model_artifacts", "v1.0")
save_output_mlflow_model
@staticmethod
def save_output_mlflow_model(context, model_name, rc_version=None, mlflow_model_version=None, experiment_name=None, run_id=None) -> TemplateOutput:
Purpose:
Creates and saves a template output for an MLflow-tracked model. This function stores the MLflow model reference in the context and adds the generated output to the output collection in a single step.
Parameters:
context: The workflow or notebook context dictionary.
model_name: (str) Name of the MLflow model.
rc_version: (str, optional) Version identifier for the model within RapidCanvas. Defaults to "default" if not provided.
mlflow_model_version: (str or int, optional) Specific version of the MLflow model to reference.
experiment_name: (str, optional) MLflow experiment name where the model run is logged.
run_id: (str, optional) MLflow run ID to uniquely identify a model run.
Returns:
The created TemplateOutput object representing the MLflow model output
Example:
# Assuming 'context' is initialized and MLflow tracking is set up
model_name = "churn_prediction_model"
rc_version = "v2.1"
mlflow_model_version = 5
experiment_name = "ChurnPrediction"
run_id = "e9bdc6c1e2b442b38bdaafc8798e8a1f"
# Store the MLflow model reference and save the output in one step
Helpers.save_output_mlflow_model(
context,
model_name,
rc_version=rc_version,
mlflow_model_version=mlflow_model_version,
experiment_name=experiment_name,
run_id=run_id
)
Notes:
This method is typically used after training and logging a model to MLflow.
It leverages the MLflowUtils helper methods to ensure that model references and outputs are properly recorded.
The context is saved automatically after adding the output, ensuring changes are persisted.
save_output_artifacts
@staticmethod
def save_output_artifacts(context, artifact_name) -> TemplateOutput:
Purpose: Creates an artifact output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionaryartifact_name
: Name of the artifact
Returns: The created TemplateOutput object
Example:
# First create an artifacts directory
artifacts_dir = Helpers.getOrCreateArtifactsDir(context, "model_files")
# Save model files to the directory
model_path = os.path.join(artifacts_dir, "model.pkl")
with open(model_path, "wb") as f:
pickle.dump(model, f)
# Create, add to collection, and save the artifact output in one step
Helpers.save_output_artifacts(context, "model_files")
save_output_answer
@staticmethod
def save_output_answer(context, answer) -> TemplateOutput:
Purpose: Creates an answer output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionaryanswer
: The answer text or object
Returns: The created TemplateOutput object
Example:
# Analyze data and create an answer
analysis_result = "Based on the data analysis, customer segment A shows the highest retention rate at 87%."
# Create, add to collection, and save the answer output in one step
Helpers.save_output_answer(context, analysis_result)
save_output_metadata
@staticmethod
def save_output_metadata(context, metadata_list) -> TemplateOutput:
Purpose: Creates a metadata output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionarymetadata_list
: List of metadata objects
Returns: The created TemplateOutput object
Example:
from utils.dtos.metadata import Metadata, MetadataSubjectType
# Create metadata objects
metadata1 = Metadata(
subject_type=MetadataSubjectType.DATASET,
subject_name="customers",
key="row_count",
value="5000"
)
metadata2 = Metadata(
subject_type=MetadataSubjectType.MODEL,
subject_name="customer_classifier",
key="accuracy",
value="0.92"
)
# Create, add to collection, and save the metadata output in one step
Helpers.save_output_metadata(context, [metadata1, metadata2])
save_output_file
@staticmethod
def save_output_file(context, output_name, file_contents, file_type: FileType = FileType.TEXT) -> TemplateOutput:
Purpose: Creates a file output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionaryoutput_name
: Name of the outputfile_contents
: String content of the filefile_type
: Type of file from FileType enum (TEXT, JSON, MARKDOWN, HTML)
Returns: The created TemplateOutput object
Example:
from utils.dtos.templateOutput import FileType
# Create a JSON configuration file
config = {
"model_params": {
"n_estimators": 100,
"max_depth": 5
}
}
json_content = json.dumps(config, indent=2)
# Create, add to collection, and save the file output in one step
Helpers.save_output_file(context, "model_config", json_content, FileType.JSON)
save_output_vector_store
@staticmethod
def save_output_vector_store(context, vector_store_obj) -> TemplateOutput:
Purpose: Creates a vector store output, adds it to the collection, and saves the context in one step.
Parameters:
context
: The context dictionaryvector_store_obj
: Vector store object (must extend VectorStoreBase)
Returns: The created TemplateOutput object
Example:
from utils.libutils.vectorStores.faiss import FaissVectorStore
import numpy as np
# Create a vector store
vector_store = FaissVectorStore(name="document_embeddings")
# Add documents with their embeddings
documents = ["This is document 1", "This is document 2", "This is document 3"]
embeddings = np.random.rand(3, 128) # Simulated embeddings
for i, (doc, embedding) in enumerate(zip(documents, embeddings)):
vector_store.add_item(i, doc, embedding)
# Create, add to collection, and save the vector store output in one step
Helpers.save_output_vector_store(context, vector_store)
Utility Functions
initH2o
@staticmethod
def initH2o(h2o=None, h2oServerUrl=None, init_type=1):
Purpose: Initializes the H2O machine learning library.
Parameters:
h2o
: Optional H2O module (imported if None)h2oServerUrl
: URL of H2O server to connect toinit_type
: 0 for connect only, 1 (default) for init local if cannot connect
Returns: Initialized H2O module
Example:
# Initialize H2O framework
h2o = Helpers.initH2o()
# Load data into H2O frame
h2o_frame = h2o.import_file("/path/to/data.csv")
# Train a model using H2O
model = h2o.estimators.gbm.H2OGradientBoostingEstimator()
model.train(x=predictors, y=target, training_frame=h2o_frame)
Notes:
If no server URL is provided, tries to get from environment variable
If no server is available, creates a local instance with a random name
getChildDir
@staticmethod
def getChildDir(context):
Purpose: Returns the transform directory for saving temporary files.
Parameters:
context
: The context dictionary
Returns: Path to the child directory
Example:
# Get the transform directory
temp_dir = Helpers.getChildDir(context)
# Use it to save temporary files
temp_file_path = os.path.join(temp_dir, "temp_data.csv")
df.to_csv(temp_file_path, index=False)
getTenantId
@staticmethod
def getTenantId(context):
Purpose: Retrieves the tenant ID from the context.
Parameters:
context
: The context dictionary
Returns: Tenant ID string
Example:
# Get the tenant ID
tenant_id = Helpers.getTenantId(context)
print(f"Current tenant: {tenant_id}")
Notes:
Checks multiple keys for backward compatibility
Returns "test-tenant" as fallback if not found
get_file_data
@staticmethod
def get_file_data(context, file_name):
Purpose: Gets the content of a file by name.
Parameters:
context
: The context dictionaryfile_name
: Name of the file to read
Returns: File content as string or None if file doesn't exist
Raises: Exception if the file is not found
Example:
# Get the content of a configuration file
try:
config_content = Helpers.get_file_data(context, "config.json")
config = json.loads(config_content)
print(f"Loaded configuration with {len(config)} settings")
except Exception as e:
print(f"Error loading file: {e}")
How it works:
Checks if
files_data
exists in the contextSearches for the file with the matching name
If the file path doesn't exist locally, downloads it
Reads and returns the file contents
Raises an exception if the file is not found, listing available files
generate_warning
@staticmethod
def generate_warning(context, warning):
Purpose: Adds a warning message to the context.
Parameters:
context
: The context dictionarywarning
: Warning message to add
Example:
# Generate a warning about missing data
Helpers.generate_warning(
context,
"10% of records have missing values in the 'age' column"
)
Notes:
Warnings are automatically included in the output when save() is called
save
@staticmethod
def save(context):
Purpose: Saves all outputs and artifacts in the context to disk.
Parameters:
context
: The context dictionary
Raises: Exception if no outputs are in the template output collection
Example:
# At the end of your notebook, save all outputs
Helpers.save(context)
How it works:
Checks if there are outputs in the template output collection
Only uploads artifacts if
should_materialize
is not in the context or is TrueValidates that all artifacts have corresponding outputs
Processes special output types (like ML models)
Persists all outputs to disk
Writes a configuration map with output file names and warnings
Notes:
This should be called at the end of your notebook
Handles uploading artifacts, packaging ML models, and saving all outputs
You must have at least one output in the template output collection
Uses the
should_materialize
flag to conditionally skip artifact uploading in test environments
Last updated