Spark Learn – SYNTASA™

Description

The Spark KNN Learn process type provides the ability to configure and productionize custom Spark K-Nearest Neighbors Learn code into a Syntasa Composer workflow. By configuring this process it is allowing Syntasa to manage the learning on a scheduled basis. A user will find that there are two ways of importing the code:

Paste into a text editor window
Upload a file with the code

After placing the code into text editor, the output locations of the process type will need to be specified.

Once the process is configured and tested it can be deployed to production, and scheduled for Syntasa to run on a scheduled basis.

Process Configuration

There are two screens that need configuring for this process type.

Parameters
Output

Parameters

The Parameters screen is where the actual custom code is imported by either pasting or file upload.

File Upload

Click the icon
A file browser window will appear
Select the file with the code to be imported
Click Open
The contents of the file will be placed in the text editor window
Also, the file name will be displayed just below the "File Upload" heading

Output

Output screen is where the table name, display name, and model name can be defined along with the option to "Load to BQ" when using Google Cloud Platform or "Load to Redshift" when using Amazon Web Services. There are three outputs for this process type per the following:

learning_metrics to help provide ability to understand performance
feature_importance
model

Expected Output

The expected out of this process type are the model that is stored in the "Base Path" and the learning_metrics and feature_importance stored in the "Location" that are found on the Output screen. Loading to BQ or Redshift helps to make querying the learning metrics and feature importance easier.