The Logistic Regression process type is used when the dependent variable is dichotomous (binary). This process is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables in an effort to provide the user with the ability to experiment faster and productionize the model so that Syntasa manages the job scheduling and running.
- Elastic Net - define the numeric value to use in the elastic net regularization method
- Iterations - provide number of times the process should iterate
- Regularization - numeric value from 0-1 used to provide additional information in order to prevent overfitting
- Use Important Weights - toggle on to have the model use important weights
- Cross Validate - toggle on to have the model cross validate
- Folds - specify number of k-folds for validation
The mapping screen provides the ability to define the fields that should be included in the model. While also defining what fields are Label, Feature, Identifier and/or Partitioned. Fields can be re-ordered and removed from this screen. There following Actions are available through the Actions menu.
- Add - add a new field
- Add All - add all fields from the input source
- Clear - remove all fields
- Import - ingest from a file
- Export - export to a file
Output screen is where the table name, display name, and model name can be defined along with the option to "Load to BQ" when using Google Cloud Platform or "Load to Redshift" when using Amazon Web Services. There are three outputs for this process type per the following ensuring the table names are unique to mitigate data being overwritten.
The expected out of this process type are the model that is stored in the "Base Path" and the learning_metrics and feature_importance stored in the "Location" that are found on the Output screen. Loading to BQ or Redshift helps to make querying the learning metrics and feature importance easier.