Description
Once the learning and scoring processes have completed, the Evaluator Process is used to produce a dataset for analytical use.
This dataset is used for viewing the accuracy of the model.
Process Configuration
There are two screens that need configuring for this process type - Parameters and Output.
Below are details of each screen and descriptions of each of the fields.
Parameters
There are three variables that need to be configured:
- Prediction Column - used to select the prediction column passed from the scoring process
- Label Column - used to select the field that is associated with the label element of the scoring process
- Evaluator Type - there are four types available to select from the drop-down:
- Binary Class Evaluator - compares two methods of assigning a binary attribute, one of which is usually a standard method and the other is being investigated
- Multi Class Evaluator - is the process of classifying instances into one of three or more classes
- Ranking Class Evaluator - sums up into a single number or score
- Regression Evaluator - is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data.
Output
The Output tab provides the ability to change the default Table Name and Display Name values on the graph canvas, along with selecting whether to load to Big Query (BQ) if in the the Google Cloud Platform (GCP), load to Redshift or RDS if in Amazon Web Services (AWS), or simply write to HDFS if an using on-premise Hadoop.
Expected Output
The expected output of the Evaluator Process is the below table within the environment the data is processed (e.g. AWS, GCP, on-premise Hadoop):
- Table Name <tb_test_evaluation> default value - table using Syntasa defined column names
- Display Name <Test Evaluation> default value - display name of node on canvas
This table can be queried directly using an enterprise provided query engine.
The resulting table contains the following fields:
- precision
- recall
- measure
- numBins
- areaUnderROC
- areaUnderPR
- partitionDate
- calc Time