RapidCanvas Docs
  • Welcome
  • GETTING STARTED
    • Quick start guide
    • Introduction to RapidCanvas
    • RapidCanvas Concepts
    • Accessing the platform
  • BASIC
    • Projects
      • Projects Overview
        • Creating a project
        • Reviewing the Projects listing page
        • Duplicating a Project
        • Modifying the project settings
        • Deleting Project(s)
        • Configuring global variables at the project level
        • Working on a project
        • Generating the about content for the project
        • Generating AI snippets for each node on the Canvas
        • Marking & Unmarking a Project as Favorite
      • Canvas overview
        • Shortcut options on canvas
        • Queuing the Recipes
        • Bulk Deletion of Canvas Nodes
        • AI Guide
      • Recipes
        • AI-assisted recipe
        • Rapid model recipe
        • Template recipe
        • Code Recipe
        • RAG Recipes
      • Scheduler overview
        • Creating a scheduler
        • Running the scheduler manually
        • Managing schedulers in a project
        • Viewing the schedulers in a project
        • Viewing the run history of a specific scheduler
        • Publishing the updated data pipeline to selected jobs from canvas
        • Fetching the latest data pipeline to a specific scheduler
        • Comparing the canvas of the scheduler with current canvas of the project
      • Predictions
        • Manual Prediction
        • Prediction Scheduler
      • Segments and Scenarios
      • DataApps
        • Model DataApp
        • Project Canvas Datasets
        • Custom Uploaded Datasets
        • SQL Sources
        • Documents and PDFs
        • Prediction Service
        • Scheduler
        • Import DataApp
    • Connectors
      • Importing dataset(s) from the local system
      • Importing Text Files from the Local System
      • Connectors overview
      • Connect to external connectors
        • Importing data from Google Cloud Storage (GCS)
        • Importing data from Amazon S3
        • Importing data from Azure Blob
        • Importing data from Mongo DB
        • Importing data from Snowflake
        • Importing data from MySQL
        • Importing data from Amazon Redshift
        • Importing data from Fivetran connectors
    • Workspaces
      • User roles and permissions
    • Artifacts & Models
      • Adding Artifacts at the Project Level
      • Adding Models at the Project Level
      • Creating an artifact at the workspace level
      • Managing artifacts at the workspace level
      • Managing Models at the Workspace Level
      • Prediction services
    • Environments Overview
      • Creating an environment
      • Editing the environment details
      • Deleting an environment
      • Monitoring the resource utilization in an environment
  • ADVANCED
    • Starter Guide
      • Quick Start
    • Setup and Installation
      • Installing and setting up the SDK
    • Helper Functions
    • Notebook Guide
      • Introduction
      • Create a template
      • Code Snippets
      • DataApps
      • Prediction Service
      • How to
        • How to Authenticate
        • Create a new project
        • Create a Custom Environment
        • Add a dataset
        • Add a recipe to the dataset
        • Manage cloud connection
        • Code recipes
        • Display a template on the UI
        • Create Global Variables
        • Scheduler
        • Create new scenarios
        • Create Template
        • Use a template in a flow notebook
      • Reference Implementations
        • DataApps
        • Artifacts
        • Connectors
        • Feature Store
        • ML model
        • ML Pipeline
        • Multiple Files
      • Sample Projects
        • Model build and predict
  • Additional Reading
    • Release Notes
      • April 21, 2025
      • April 01, 2025
      • Mar 18, 2025
      • Feb 27, 2025
      • Jan 27, 2025
      • Dec 26, 2024
      • Nov 26, 2024
      • Oct 24, 2024
      • Sep 11, 2024
        • Aug 08, 2024
      • Aug 29, 2024
      • July 18, 2024
      • July 03, 2024
      • June 19, 2024
      • May 30, 2024
      • May 15, 2024
      • April 17, 2024
      • Mar 28, 2024
      • Mar 20, 2024
      • Feb 28, 2024
      • Feb 19, 2024
      • Jan 30, 2024
      • Jan 16, 2024
      • Dec 12, 2023
      • Nov 07, 2023
      • Oct 25, 2023
      • Oct 01, 2024
    • Glossary
Powered by GitBook
On this page
  • Snowflake
  • Mongo
  • Amazon S3
  • Google Cloud Storage (GCS)
  • Azure Blob Storage
  • MySQL/MsSQL
  • Redshift
  • Redis
  1. ADVANCED
  2. Notebook Guide
  3. Reference Implementations

Connectors

This section explains about various datasources with which the connection can be established to fetch the datasets or files.

from utils.rc.client.requests import Requests
from utils.rc.client.auth import AuthClient

from utils.rc.dtos.project import Project
from utils.rc.dtos.dataset import Dataset
from utils.rc.dtos.recipe import Recipe
from utils.rc.dtos.transform import Transform
from utils.rc.dtos.template_v2 import TemplateV2, TemplateTransformV2
from utils.rc.dtos.segment import Segment, ItemExpression, Operator
from utils.rc.dtos.scenario import Scenario
from utils.rc.dtos.dataSource import DataSource
from utils.rc.dtos.dataSource import DataSourceType
from utils.rc.dtos.dataSource import SnowflakeConfig
from utils.rc.dtos.dataSource import MongoConfig
from utils.rc.dtos.dataSource import S3Config
from utils.rc.dtos.dataSource import GcpConfig
from utils.rc.dtos.dataSource import AzureBlobConfig
from utils.rc.dtos.dataSource import MySQLConfig
from utils.rc.dtos.dataSource import RedshiftConfig
from utils.rc.dtos.dataSource import RedisStorageConfig

import pandas as pd
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
# Requests.setRootHost("https://test.dev.rapidcanvas.net/api/")
# Requests.setRootHost("http://localhost:8080/api/")
AuthClient.setToken()
INFO:Authentication successful

Snowflake

Establishing a connection with Snowflake datasource

Use this code snippet in Notebook to establish a connection with Snowflake data source.

dataSource = DataSource.createDataSource(
    "snowflake-101",
    DataSourceType.SNOWFLAKE,
    {
        SnowflakeConfig.USER: "nikunjrc",
        SnowflakeConfig.PASSWORD: "sZEWA27V86YGs5G",
        SnowflakeConfig.ACCOUNT: "OM82799.us-central1.gcp"
    }
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Snowflake",
    description="Testing snowflake lib",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Fetching files from this database and uploading to canvas

This code is used on the Notebook to fetch the file from Snowflake and upload this onto the canvas.

signup = project.addDataset(
    dataset_name="signup",
    dataset_description="signup golden",
    data_source_id=dataSource.id,
    data_source_options={
        SnowflakeConfig.WAREHOUSE: "COMPUTE_WH",
        SnowflakeConfig.QUERY: "SELECT * FROM rapidcanvas.public.SIGNUP"
    }
)
signup.getData()

Exporting the output dataset to Snowflake datasource

The following code snippet allows you to export the output dataset to the Snowflake datasource.

dataset.update_sync_options(
    dataSource.id,
    {
      SnowflakeConfig.TABLE: "table name",
      SnowflakeConfig.DATABASE: "database name",
      SnowflakeConfig.SCHEMA: "schema name",
      SnowflakeConfig.IF_TABLE_EXISTS : "append"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Snowflake.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    SnowflakeConfig.TABLE: "table name",
    SnowflakeConfig.DATABASE: "database name",
    SnowflakeConfig.SCHEMA: "schema name"
  }
)

Mongo

Establishing a connection with Mongo datasource

Use this code snippet in Notebook to establish a connection with Mongo data source.

dataSource = DataSource.createDataSource(
    "mongo-101",
    DataSourceType.MONGO,
    {
        MongoConfig.CONNECT_STRING: "mongodb://testuser2:testuser2@34.68.122.18:27017/test"
    }
)
2023-02-02 12:07:09.094 INFO    root: Found existing data source by name: mongo-101
2023-02-02 12:07:09.095 INFO    root: Updating the same

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Mongodb",
    description="Testing mongodb lib",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)
2023-02-02 12:09:07.010 INFO    root: Found existing project by name: Test Mongodb
2023-02-02 12:09:07.011 INFO    root: Deleting existing project
2023-02-02 12:09:07.123 INFO    root: Creating new project by name: Test Mongodb

Fetching a file from database and uploading to canvas

The following code snippet is used to upload the dataset that is fetched from Mongo database onto the canvas.

titanic = project.addDataset(
    dataset_name="titanic",
    dataset_description="titanic golden",
    data_source_id=dataSource.id,
    data_source_options={
        MongoConfig.DATABASE: "test",
        MongoConfig.COLLECTION: "titanic",
        MongoConfig.QUERY_IN_JSON_FORMAT: "{}"
    }
)
2023-02-02 12:09:07.300 INFO    root: Creating new dataset by name:titanic
titanic.getData()
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked

0

3

1

3

Heikkinen, Miss. Laina

female

26.0

0

0

STON/O2. 3101282

7.925

nan

S

1

6

0

3

Moran, Mr. James

male

nan

0

0

330877

8.4583

nan

Q

2

7

0

1

McCarthy, Mr. Timothy J

male

54.0

0

0

17463

51.8625

E46

S

3

8

0

3

Palsson, Master. Gosta Leonard

male

2.0

3

1

349909

21.075

nan

S

4

14

0

3

Andersson, Mr. Anders Johan

male

39.0

1

5

347082

31.275

nan

S

...

...

...

...

...

...

...

...

...

...

...

...

...

95

105

0

3

Gustafsson, Mr. Anders Vilhelm

male

37.0

2

0

3101276

7.925

nan

S

96

106

0

3

Mionoff, Mr. Stoytcho

male

28.0

0

0

349207

7.8958

nan

S

97

107

1

3

Salkjelsvik, Miss. Anna Kristine

female

21.0

0

0

343120

7.65

nan

S

98

78

0

3

Moutal, Mr. Rahamin Haim

male

nan

0

0

374746

8.05

nan

S

99

109

0

3

Rekic, Mr. Tido

male

38.0

0

0

349249

7.8958

nan

S

100 rows × 12 columns

Exporting the output dataset to Mongo datasource

The following code snippet allows you to export the output dataset to the Snowflake datasource.

dataset.update_sync_options(
    dataSource.id,
    {
      MongoConfig.COLLECTION: "collection name",
      MongoConfig.DATABASE: "database name",
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Snowflake.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    MongoConfig.COLLECTION: "collection name",
    MongoConfig.DATABASE: "database name",
  }
)

Amazon S3

Establishing a connection with Amazon S3 datasource

Use this code snippet in Notebook to establish a connection with Amazon S3 data source.

dataSource = DataSource.createDataSource(
"s3-101",
DataSourceType.S3_STORAGE,
{
    S3Config.BUCKET: "bucket-name",
    S3Config.ACCESS_KEY_ID: "access-key-id",
    S3Config.ACCESS_KEY_SECRET: "access-key-secret"
}
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Amazon S3",
    description="Testing Amazon S3",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Fetching a file from database and uploading to canvas

The following code snippet is used to upload the dataset that is imported from Amazon S3 onto the canvas.

project.addDataset(
    dataset_name="signup",
    dataset_description="signup golden",
    data_source_id=dataSource.id,
    data_source_options={
        S3Config.FILE_PATH: "file-path"
    }
)
signup.getData()

Exporting the output dataset to Amazon S3 datasource

The following code snippet allows you to export the output dataset to the Amazon S3 datasource.

dataset.update_sync_options(
    dataSource.id,
    {
      S3Config.OUTPUT_FILE_DIRECTORY: "files/",
      S3Config.OUTPUT_FILE_NAME: "dataset.parquet"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Snowflake.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    S3Config.OUTPUT_FILE_DIRECTORY: "files/",
    S3Config.OUTPUT_FILE_NAME: "dataset-${RUN_ID}.parquet"
  }
)

Google Cloud Storage (GCS)

Establishing a connection with GCS datasource

Use this code snippet in Notebook to establish a connection with Google Cloud Storage data source.

dataSource = DataSource.createDataSource(
    "gcp-101",
    DataSourceType.GCP_STORAGE,
    {
        GcpConfig.BUCKET: "bucket-name",
        GcpConfig.ACCESS_KEY: "access key path"
    }
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Google Cloud Storage",
    description="Testing Google Cloud Storage",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Fetching a file from database and uploading to canvas

The following code snippet is used to upload the dataset that is imported from Google Cloud Storage onto the canvas.

project.addDataset(
    dataset_name="signup",
    dataset_description="signup golden",
    data_source_id=dataSource.id,
    data_source_options={
        GcpConfig.FILE_PATH: "file-path"
    }
)
signup.getData()

Exporting the output dataset to GCS datasource

The following code snippet allows you to export the output dataset to the Google Cloud Storage (GCS) datasource.

dataset.update_sync_options(
    dataSource.id,
    {
      GcpConfig.OUTPUT_FILE_DIRECTORY: "files/",
      GcpConfig.OUTPUT_FILE_NAME: "dataset.parquet"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Google Cloud Storage.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    GcpConfig.OUTPUT_FILE_DIRECTORY: "files/",
    GcpConfig.OUTPUT_FILE_NAME: "dataset-${RUN_ID}.parquet"
  }
)

Azure Blob Storage

Establishing a connection with Azure Blob datasource

Use this code snippet in Notebook to establish a connection with Azure blob storage data source.

dataSource = DataSource.createDataSource(
    "azure-101",
    DataSourceType.AZURE_BLOB,
    {
        AzureBlobConfig.CONTAINER_NAME: "container-name",
        AzureBlobConfig.CONNECT_STR: "connect-string",

    }
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Azure Blob Storage",
    description="Testing Azure Blob Storage",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Fetching a file from database and uploading to canvas

The following code snippet is used to upload the dataset that is imported from Azure Blob Storage onto the canvas. This has the file path from where the file is located.

project.addDataset(
    dataset_name="signup",
    dataset_description="signup golden",
    data_source_id=dataSource.id,
    data_source_options={
        AzureBlobConfig.FILE_PATH: "file-path"
    }
)

Exporting the output dataset to Azure Blob datasource

The following code snippet allows you to export the output dataset to the Azure Blob datasource.

dataset.update_sync_options(
    dataSource.id,
    {
      AzureBlobConfig.OUTPUT_FILE_DIRECTORY: "files/",
      AzureBlobConfig.OUTPUT_FILE_NAME: "dataset.parquet"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Azure Blob.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    AzureBlobConfig.OUTPUT_FILE_DIRECTORY: "files/",
    AzureBlobConfig.OUTPUT_FILE_NAME: "dataset-${RUN_ID}.parquet"
  }
)

MySQL/MsSQL

Establishing a connection with MySQL datasource

Use this code snippet in Notebook to establish a connection with MySQL data source.

dataSource = DataSource.createDataSource(
    "mysql-101",
    DataSourceType.MYSQL,
    {
        MySQLConfig.CONNECT_STRING: "mysql://root:password@34.170.43.138/azure"
    }
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test MySQL/MsSQL",
    description="Testing MySQL/MsSQL",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Fetching a file from database and uploading to canvas

The following code snippet is used to upload the dataset that is imported from MySQL/MsSQL onto the canvas.

dataset = project.addDataset(
    dataset_name="titanic",
    dataset_description="titanic golden",
    data_source_id=dataSource.id,
    data_source_options={
        MySQLConfig.QUERY: "SELECT * FROM titanic limit 100"
    }
)

Exporting the output dataset to MySQL datasource

The following code snippet allows you to export the output dataset to the MySQL/MsSQL datasource.

dataset.update_sync_options(
    dataSource.id,
    {
        MySQLConfig.TABLE: "titanic"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to MySQL/MsSQL.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    MySQLConfig.TABLE: "titanic"
  }
)

Redshift

Establishing a connection with Redshift datasource

Use this code snippet in Notebook to establish a connection with Redshift data source.

dataSource = DataSource.createDataSource(
    "redshift-101",
    DataSourceType.REDSHIFT,
    {
        RedshiftConfig.CONNECT_STRING: "mysql://root:password@34.170.43.138/azure"
    }
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Redshift",
    description="Testing Redshift",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Fetching a file from database and uploading to canvas

The following code snippet is used to upload the dataset that is imported from Redshift onto the canvas.

dataset = project.addDataset(
    dataset_name="titanic",
    dataset_description="titanic golden",
    data_source_id=dataSource.id,
    data_source_options={
        RedshiftConfig.QUERY: "SELECT * FROM titanic limit 100"
    }
)

Exporting the output dataset to Redshift datasource

The following code snippet allows you to export the output dataset to the Redshift datasource.

dataset.update_sync_options(
    dataSource.id,
    {
        RedshiftConfig.TABLE: "titanic"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Redshift.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    RedshiftConfig.TABLE: "titanic"
  }
)

Redis

Establishing a connection with Redis datasource

Use this code snippet in Notebook to establish a connection with Redis data source.

dataSource = DataSource.createDataSource(
    "redis-101",
    DataSourceType.REDIS_STORAGE,
    {
        RedisStorageConfig.HOST: "127.0.0.1",
        RedisStorageConfig.PORT: "6379"
    }
)

Creating a project

The following code snippet is used to create a project.

project = Project.create(
    name="Test Redis",
    description="Testing Redis",
    icon="https://rapidcanvas.ai/wp-content/uploads/2022/09/bitcoin_prediction_med.jpg",
    createEmpty=True
)

Note: You cannot import files from Redis to the platform but can export files and store in this data source.

Exporting the output dataset to Redis datasource

The following code snippet allows you to export the output dataset to the Redis datasource.

dataset.update_sync_options(
    dataSource.id,
    {
        RedisStorageConfig.FEATURE_NAME: "titanic",
        RedisStorageConfig.FEATURE_KEY_COLUMN: "PassengerId",
        RedisStorageConfig.FEATURE_VALUE_COLUMNS: "Sex,Parch"
    }
)
dataset.sync()

Scheduling a job

When a scheduled job is run, the source dataset updated with a fresh set of records is used in the machine learning flow of a project to generate a new output dataset. Subsequently, this output dataset is exported to Redis.

project_run = ProjectRun.create_project_run(
  project.id, "test-run-v1", "*/2 * * * *"
)

project_run.add_project_run_sync(
  dataset.id,
  dataSource.id,
  {
    RedisStorageConfig.FEATURE_NAME: "titanic",
    RedisStorageConfig.FEATURE_KEY_COLUMN: "PassengerId",
    RedisStorageConfig.FEATURE_VALUE_COLUMNS: "Sex,Parch"
  }
)
PreviousArtifactsNextFeature Store