Create a template

NOTE: Charts and Artifacts to be added.

This is a guide to creating a curated template in RapidCanvas. Our templates that you create will be used by our no-code users on the RapidCanvas platform to perform a wide range of tasks on a wide range of datasets.

Goal

Your goal is to create a generalized template notebook file that is well documented, usable and understandable for no-code users, and works across any relevant dataset.

Setup

RapidCanvas allows you to create a template and use existing templates. Templates can make it easier to work with different types of data. Templates help you save time and effort. They also make it easy to create common logic.

To start building a template, follow the below instructions:

You will work within a folder named template-lib. Your project will be located in the folder "projects".
Under projects you will find an example of an employee-v3-syntax project implementation. This file contains transforms and the main flow file. You can observe how the SDK is used in a project inside the projects.
Create an account on http://staging.dev.rapidcanvas.net/#/
Go to the template-lib folder and run_tmpltlib.py script (./run_tmpltlib.py) and follow the steps inside your command line.

If you open the example employee project file, you'll see a few folders (data, output, and transforms) as well as a jupyter notebook file. For now, you only need to be concerned with transforms and the flow notebook

Building a solution in RapidCanvas involves two types of jupyter notebook files: transform files and flow files. Transform files are notebook files that are individual data transformations or operations that you perform on an input dataset.

Transforms are used inside of the main flow file. The flow file is where we create our project, authenticate on RapidCanvas, and build our data pipeline. To use our transform file inside of the flow we create a Recipe. A recipe is an instantiation of 1 or more transforms. See the titanic example. The recipe is then run inside of the flow file and produces an output dataset.

Try running employee_flow.ipynb and follow along inside of the RapidCanvas UI step-by-step to see how the pipeline is built from the flow.

Creating the Template

To create a template, you will be writing a jupyter notebook that takes a set of input parameters and outputs the required output. Requirements for each specific template will be sent to you.

You are given a project folder called “employee-v3-syntax” which contains a flow file, 2 example transform Notebooks, and some test datasets to use to build your template.

Start by running your flow file by launching with the run_tmpltlib.py script (you may need to edit the flow notebook to remove the existing auth token) to see how the flow corresponds with the UI. Follow along on the UI to see what is happening.

Next, create a copy of of one of the transform notebooks inside of the transforms folder. Change the name of the notebook to the name of the transform that you are building.

Inside of this notebook is where you will write all of the code for your template.

Then, run and debug the notebook you have created by following the example code inside of the flow notebook. The notebook demonstrates how to publish and use your template with the two given examples.

Documenting the Template

When creating the template, you need to document inside of the notebook a few things:

The input parameters These should be understandable for a no-code user and have solid descriptions for the tool-tip in the UI. See the example transform for more. While creating the template be sure to check inside of the UI how it is behaving- this is the main point. A business user should be able to use the template on their own dataset.

Usage of the template You should write in a markdown block the expected usage of the template including what datasets this works for, best practices, and any usage boundaries/conditions.

Testing the Template To test the template, pick a few relevant datasets. Some have been provided in the data folder of the template_creation folder. Test the template using the chosen datasets. Try using it inside of the UI by modifying the input parameters using the UI. You can do this by editing the recipe and hitting the edit & test button on the listed transform.

After testing, fix any unexpected results/bugs and document any usage boundaries that you have discovered. Once this is done and you are satisfied with the result, send the project folder back to us. We will then review your results and come back with notes or any needed changes.

Displaying a template on the UI

When creating a transform, you can display it on the UI. You can use this to expose parameters inside of the UI so that your transform can be used by non-technical users and work across projects.

Displaying the transformation using DataApp v3

To expose parameters on the UI, you must define each parameter and their properties in the second code block of your transform notebook. Each parameter makes a call to a “get or create” function that takes a name which is the name can be used inside the flow file to pass variables into your transformation, metadata which is a dictionary that defines how the parameter will be used and displayed, and a local_context which will always be equal to locals().

Creating an input dataset

To add an input dataset to be used in your transformation, use the get_or_create_input_dataset method. Here is an example:

  inputDatasetParameter = Helpers.get_or_create_input_dataset(
    name="inputDataset",
    metadata=Metadata(input_name='Input Dataset', is_required=True, tooltip='Dataset to apply the transformation'),
    local_context=locals()
  )

The input dataset can then be used in your transform by using inputDatasetParameter.value

We recommend getting the input dataset value and assigning it to a variable, like the following:

  inDF = Helpers.getEntityData(context, inputDatasetParameter.value)

Required metadata fields: input_name, is_required, tooltip

Creating an input variable

To add an input variable to be used in your transformation, use the get_or_create_input_var method. Here is an example:

  start_dateParameter = Helpers.get_or_create_input_var(
      name="start_date",
      metadata=Metadata(input_name="Start Date", is_required=True, tooltip="Initial date to do the diff", multiple=False, datatypes=['TIMESTAMP'], options=['FIELDS', 'CONSTANT'], dataset=['inputDataset']),
      local_context=locals()
  )

Then, the variable can be used in your transform by using start_dateParameter.value

Required metadata fields: input_name, is_required, tooltip, multiple, datatypes, options

Creating an output dataset parameter

To give users the ability to define the name of an output dataset, use the get_or_create_output_dataset method. Here is an example:

  outputDatasetParameter = Helpers.get_or_create_output_dataset(
    name="outputDataset",
        metadata=Metadata(input_name='Output Dataset', is_required=True, tooltip='Dataset name to be created after the transformation'),
        local_context=locals()
  )

The output dataset name given by the user can then be accessed by using outputDatasetParameter.value

Required metadata fields: input_name, is_required, tooltip

Creating an output chart

To give users the ability to name the output charts, use the get_or_create_output_chart method. Example:

  outputChartParameter=Helpers.get_or_create_output_chart(
      name="outputChart",
      metadata=Metadata(input_name='Output Chart Name', is_required=True, tooltip='Name of the output chart'),
      local_context=locals()
  )

The name of the output chart given by the user is then accessed inside the transform notebook using outputChartParameter.value

Required metadata fields: input_name, is_required, tooltip

Metadata fields

Field

Description

default_value

Value that will be used if the user does not enter any input for the parameter.

input_name

Name of the parameter to be displayed on the UI

datatypes

List that limits the datatypes accepted by the parameter to any of STRING LONG DOUBLE BOOLEAN TIMESTAMP or ALL

options

List that defines what the data input options are. They can be 'FIELDS' or 'CONSTANT' or both. Fields allows a user to select a column/field from the dataset defined by the datasets . If you choose FIELDS for options, you must define the datasets from which the user can select the columns. If CONSTANT is used, you can add a list of the possible options that users can choose from by defining the constant_options

constant_options

List of options that you want presented to the user. See options above

datasets

List of datasets that will be used to populate field/column options. See options above

tooltip

Description of the input parameter. This will be displayed to the user when they click on a “?” icon. Here is where you should clearly and concisely describe the parameter and how it will be used. It is best to keep it to a sentence or two.

is_required

Boolean value that defines whether or not the value is needed for your transformation

multiple

Boolean value that defines whether or not a user can select multiple values for the input parameter.

Using a template in a flow notebook

To use a template inside of the flow notebook, start by creating a new template using the syntax in the following example:

  time_diff_template = TemplateV2(
      name="time_diff", description="Calculate the time difference between two dates",
      source="CUSTOM", status="ACTIVE", tags=["UI", "Scalar"]
  )

Give your template a name , description , and tags . For now, your source should always be "CUSTOM" and status should always be “ACTIVE"

Next, you will add the transform notebook you have made to a template transform. See the following example for syntax:

  time_diff_template_transform = TemplateTransformV2(
      type = "python", params=dict(notebookName="timediff.ipynb"))

Then you will add the template transform to the template and publish your template:

  time_diff_template.base_transforms = [time_diff_template_transform]
  time_diff_template.publish("transforms/timediff.ipynb")

To use your published template as a transform, you then need to create a transform object, assign its templateId for the id of your published template, give it a name, and pass in the values you want to use for your variables:

  calculate_age_transform = Transform()
  calculate_age_transform.templateId = time_diff_template.id
  calculate_age_transform.name='age'
  calculate_age_transform.variables = {
      'inputDataset': 'employee',
      'start_date': 'birth_date',
      'end_date': 'start_date',
      'how': 'years',
      'outputcolumn': 'age',
      'outputDataset': 'employee_with_age'
  }

Note that the names of the variables were defined previously inside of the transform notebook with the get_or_create...() method.

Now you are ready to add the transform to a recipe and run the recipe.

  calculate_age_recipe.add_transform(calculate_age_transform)
  calculate_age_recipe.run()

For further reference, see the employee_flow.ipynb example in the employee-v3-syntax project folder.

PreviousIntroduction NextCode Snippets