Train Test Split
This transform allows you to split the dataset into training and testing sets to build a predictive model. You can specify the percentage of data to be used for training and testing the model.
Parameters
This table provides a brief description about each parameter in Train Test Split transform.
- Name:
By default, the transform name is populated. You can also add a custom name for the transform.
- Input Dataset:
The file name of the input dataset on which train and test split transform must be applied. You can select the dataset that was uploaded from the drop-down list. (Required: True, Multiple: False)
- Target column:
The target column used for predictions.
- Test size:
The percentage of data to be used for testing the model. Based on this, the data will be split into two; one for testing and the other for training.
- Output Train Dataset:
The file name with which the output dataset is created after training the model. (Required: True, Multiple: False)
- Output Test Dataset:
The file name with which the output dataset is created after testing the model. (Required: True, Multiple: False)
The sample input for this transform looks as shown in the screenshot:
data:image/s3,"s3://crabby-images/e96d9/e96d9d978c884f2a5d84434eabd34fe65f3c97ff" alt="../../../_images/testtrain_input.png"
The output after running the Train Test Split transform on the dataset appears as below. This is the output after training the model.
data:image/s3,"s3://crabby-images/b9b2b/b9b2bf777a72196b5d4a2d8bc60b5b839f9d99a3" alt="../../../_images/train_output.png"
This is the output generated after testing the model.
data:image/s3,"s3://crabby-images/0174a/0174af197997edac1e3fe27d6b6f9a05a21bd2ce" alt="../../../_images/test_output.png"
How to use it in Notebook
The following is the code snippet you must use in the Jupyter Notebook editor to run the Train Test Split transform:
train_ds_name = dataset_input_name + "_train"
test_ds_name = dataset_input_name + "_test"
transform = Transform()
transform.name = "train test split"
transform.templateId = train_test_split.id
transform.variables = {
"inputDataset": dataset_w_bin_cols.name,
"targetCol": targetCol,
"test_size": 0.2,
"output_train": train_ds_name,
"output_test": test_ds_name
}
recipe_split = project.addRecipe([dataset_w_bin_cols], name="train test split")
# recipe_split.prepareForLocal(transform, contextId="recipe_split")
recipe_split.addTransform(transform)
recipe_split.run()
Requirements
pandas