RapidCanvas Concepts
Table of Contents
Basic
User
Click to expand
Anyone with access to the web or notebook interface of RapidCanvas is a user. There are two types of users: Admin and Read-Only Users. Admin users have the privilege to invite other team members to the tenant. All users get access to RC by invitation to join a tenant. All users can create and collaborate on projects in tenants to which they have access. A user can be part of multiple tenants with different roles.
Project
Click to expand
A project on RapidCanvas is typically an exercise to solve a business-relevant data science problem. A project is a combination of user and/or organizational assets such as data, recipes, and dashboards. It encapsulates the journey from raw data to predicting whatever the business requires. A project output can be a productionized Machine Learning model which predicts on new incoming data, or it can be a dashboard which shows relevant data metrics for your team.
Use Case - Your marketing team would like to solve 2 problems: Recommend new products to users based on their past purchases and customize coupon recommendations for each user. Both of these can be tackled by creating 2 separate projects.
Projects are created in tenants and can be associated with custom environments.
Canvas
Click to expand
Canvas in RapidCanvas is a dynamic acyclic graph (also called DAG) which allows business users to build a sequence of data input, data processing, and modeling steps to build a project. The canvas provides a holistic view of the project and simplifies explainability.
A canvas consists of nodes and connectors. Nodes can be either data sets, recipes, or dashboards. More about them in the next few sections.
Connectors
Click to expand
Connectors are locations typically where your data resides. It can either be one of local storage (your computer), cloud storage (Amazon S3, GCP, Azure Blob Storage, MongoDB, MySQL, Amazon Redshift, and Snowflake), or on-premise. RapidCanvas provides the ability to connect to these connectors or data sources and fetch data into your RapidCanvas interface.
Data table
Click to expand
Data imported into RapidCanvas or output data generated within an RapidCanvas project after processing is considered a data table. Data tables are bound by a project and are represented by a rectangle on the canvas.
Recipe
Click to expand
A recipe is either a single transform or a collection of multiple transforms, which provides the ability to process data tables, build, and run machine learning models in RC. A recipe can output a data table, dashboard, or a model. A recipe is represented by a circle on the canvas.
Transform
Click to expand
A transform is a data processing unit which can take individual or multiple inputs and generate an output. A simple example of a transform can be concatenating 2 columns. Inputs can be first name and last name, and the output can be the full name.
Advanced
Workspace
Click to expand
A workspace is required for a group of users to work. Data and projects inside a workspace are only accessible to the users who are part of the workspace. Typically, organizations with a large number of projects have multiple workspaces to organize streams of AI/ML Problems they solve.
Environment
Click to expand
An environment is an infrastructure unit created under a tenant. It defines the compute resources and user-selected packages that you may need to run your projects. RapidCanvas provides a set of standard configurations of compute resources. In a tenant, users can create multiple custom environments. In a tenant, each project can be associated with a custom environment.
Standard configurations of compute resources provided by RapidCanvas are:
SMALL: 1 Core, 2GB Memory
MEDIUM: 2 Cores, 4GB Memory
LARGE: 4 Cores, 8GB Memory
CPU_LARGE: 8 Cores, 16GB Memory
MAX_LARGE: 12 Cores, 32GB Memory
EXTRA_MAX_LARGE: 12 Cores, 48GB Memory
Template
Click to expand
A template is a building block of the transform. A template defines the logic of the data transformation, the inputs that need to be passed, and the output that will be generated.
RapidCanvas provides a library of system templates and is continuously adding more to the library. In case the system templates are not adequate, notebook users will be able to create templates as per their project needs. Business users on the web interface of RapidCanvas would see only Recipes and Transforms and would be limited to the use of system templates and any custom templates developed and made available inside a tenant by their team of notebook users.
Artifact
Click to expand
Any project-related asset such as models that are built by users can be saved as artifacts. These artifacts can be used across projects. Typically, generic and reusable assets are saved as artifacts and are used in multiple projects. Creating, saving, and accessing artifacts is available only to notebook users currently.
Scenarios
Click to expand
A scenario is created within a project and allows you to run a project only when certain conditions are met. Scenarios can be used to run your project with different datasets and changing parameters which go into a recipe.
Segments
Click to expand
Within a project, Segments are subsets of your input data sets that meet user-defined conditions. Segments can be used to slice and dice your data and associate with scenarios. This allows you to run your scenarios with only some of the data.
Use Case - You have 12 months of historic data; however, you would like to build multiple models each with varying amounts of historic data. You can create segments with 1 month, 3 months, 6 months, 12 months of data, etc., and create corresponding scenarios to build different models and compare their performance.
Last updated