Description
The LTM Process is one of the steps in the process of building the training dataset using the Lookback process dataset.
This process provides the ability to apply reductions, transformations, and filters to cleanse the data further.
Process Configuration
The LTM Process Configuration screen has three tabs - Parameters, Filters, Output.
Below are details of each screen and descriptions of each of the fields.
Parameters
There are three types of Parameters that are available for configuration - Reductions, Transformations, Outliers.
Reductions
- Method - define reduction methods to apply to the data (3 available)
- Tfidf (term frequency inverse document frequency) - used for term importance in a document that can be defined with a minDf, number of features, ability to Tokenize a specific delimiter, and ability to enable n-gram model sequence on a specified field
- Tf (term frequency) - used for term importance where a number of features on a defined field are configured, ability to Tokenize a specific delimiter, and ability to enable n-gram model sequence on a specific field
- One Hot Encoding - transforms categorical features to a format that works better with classification and regression algorithms by performing binarization of the features
- Select Fields - dropdown to define the field(s) within the Lookback dataset that should have the reduction method applied
- minDF - minimum document features
- Features - total features to consider
- Delimiter - toggle on/off to define the character to delimit the values in the concatenated string
- n-gram - toggle on/off to define the length or number to apply for the character sequence
Transformations
- Method - defines a transformation method to apply to the data (4 available)
- Max Abs - this estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0
- Bucketed Lsh (Locality-sensitive hashing) - reduces dimensionality mapping similar items to same 'buckets' with high probability
- Min Max - a normalization that applies a 0 to the minimum of a variable and a 1 to the maximum of a variable
- Z Score - converts all indicators to a common scale with an average of zero and standard deviation of one
- Select Field(s) - drop-down to define the field or fields the transformation should be applied to
Outliers
- Method - defines how to handle outliers, if desired
- Std Dev - defines if outliers should be handled by specifying the number of standard deviations that should be considered for the training dataset
Filters
Filters provides the ability to filter the dataset (aka apply a Where clause) to include only certain data.
To create a filter click the green plus button and the filter editor screen will appear. Multiple filters can be applied, ensure the proper (AND/OR) logic is applied.
Output
The Output tab provides the ability to provide a name for the Display Name of the treatment model on the graph canvas.
Expected Output
The expected output of the LTM process creates a treatment model, which is stored in the Base Path, to apply to the history(lookback) dataset which is then referenced in the Featurize Process.