Logistic Regression – SYNTASA™

Description

The Logistic Regression process type is used when the dependent variable is dichotomous (binary). This process is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables in an effort to provide the user with the ability to experiment faster and productionize the model so that Syntasa manages the job scheduling and running.

Help Desk > v4 - Logistic Regression > image2018-11-20_10-41-26.png

Process Configuration

Algorithm Details

Elastic Net - define the numeric value to use in the elastic net regularization method
Iterations - provide number of times the process should iterate
Regularization - numeric value from 0-1 used to provide additional information in order to prevent overfitting
Use Important Weights - toggle on to have the model use important weights
Cross Validate - toggle on to have the model cross validate
- Folds - specify number of k-folds for validation

Help Desk > v4 - Logistic Regression > image2018-11-20_10-42-44.png

Mapping

The mapping screen provides the ability to define the fields that should be included in the model. While also defining what fields are Label, Feature, Identifier and/or Partitioned. Fields can be re-ordered and removed from this screen. There following Actions are available through the Actions menu.

Actions

Add - add a new field
Add All - add all fields from the input source
Clear - remove all fields
Import - ingest from a file
Export - export to a file

Help Desk > v4 - Logistic Regression > image2018-11-20_10-55-50.png

Output

Output screen is where the table name, display name, and model name can be defined along with the option to "Load to BQ" when using Google Cloud Platform or "Load to Redshift" when using Amazon Web Services. There are three outputs for this process type per the following ensuring the table names are unique to mitigate data being overwritten.

learning_metrics
feature_importance
model

Help Desk > v4 - Logistic Regression > image2018-11-20_10-56-52.png

Expected Output

The expected out of this process type are the model that is stored in the "Base Path" and the learning_metrics and feature_importance stored in the "Location" that are found on the Output screen. Loading to BQ or Redshift helps to make querying the learning metrics and feature importance easier.