Split – SYNTASA™

Description

The Split process is used to divide your dataset into two parts: Training and Testing datasets.

Help Desk > v4 - Split > Screen Shot 2018-11-13 at 1.24.52 PM.png

Process Configuration

There are two screens that need configuring for this process type.

Input
Outputs

Input

Help Desk > v4 - Split > Screen Shot 2018-11-13 at 2.05.28 PM.png

Train Ratio
- The values can be anything between 0 and 1.
- Example - Train Ratio = 0.8, it means 80% of the data goes for Training and 20% is for Testing.
Set Seed
- Seed values are used for randomization.
- The values have to be integer type (minimum value is 1 and maximum 9999)
- By default, Syntasa has set the seed value as 1000

Outputs

Help Desk > v4 - Split > Screen Shot 2018-11-13 at 2.08.44 PM.png

Output screen is where the table name can be defined along with the option to "Load to BQ" when using Google Cloud Platform or "Load to Redshift" when using Amazon Web Services.

Expected Output

The expected output of this process type are two dataset tables Training and Testing tables that are produced by the train ratio from the Input screen. These output datasets will be written to tables in the environment the code was run (i.e. BigQuery, Redshift).

{[{category.name}]}

Description

Process Configuration

Input

Outputs

Expected Output