Featurize – SYNTASA™

Description

Featurize is a composer process that helps in performing feature engineering on the Input datasets. Some of the high-level capabilities available with Featurize are:

Multiple datasets can be used as Input
Filtering options

Example use cases

Customer-level history dataset joining treatment models and aggregating it by an identifier.

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 11.15.34 AM.png

Process Configuration

There are numerous variations of data preparation the Featurize process provides. The following is only meant to provide an understanding of each screen and field.

This section provides the information Syntasa needs if more than one set of data will be joined.

Input

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 11.42.54 AM.png

Join

To create a join, click the green plus button.

Help Desk > v4 - Featurize > image2018-11-5_10-43-37.png

Joins

Join Type - left or inner join
Dataset selector - choose the dataset that will be joined with first dataset
Alias - type a table alias if a different name is desired or required
Left Value - choose the field from the first dataset that will provide a link with the joined dataset (i.e. customer ID if joining a CRM dataset)
Operator - select how the left value should be compared with the right value, for joins this will typically be an = sign
Right Value - select the joining dataset value that is being compared with the left value

Mapping

The Mapping screen is where the event data fields are mapped into the Syntasa fields, desired functions get applied, and user-friendly labels get created.

This section is where the input data is defined and labeled into the Syntasa schema. Syntasa has a growing set of custom functions that can be applied along with any Hive functions perform data transformation. It is recommended to consult Syntasa professional services with any questions before applying other than the default functions.

Actions

For Featurize there are six options available: Add, Add All, Clear, Function, Import and Export. Add is used to select specific fields from the input table. Add All will select all fields from the input table. Clear will clear all selected fields from the mapping canvas. Function is used to access the function editor to create custom fields. Import is selected if the client has JSON data available to provide the custom mappings. Export is utilized to export the existing mapping schema in a .csv format that can be used to assist in the editing or manipulation of the schema. This updated file could then be used to input an updated schema into the dataset.

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 12.21.17 PM.png

To Add field(s):

Click Actions button
Click Add
Select Field(s) menu presented

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 12.23.11 PM.png

select field(s)
click Apply

To Add All:

Click Actions button
Click Add All
All fields from the input table are now populated in the mapping schema

To Clear:

Click Actions button
Click Clear
All fields in the mapping schema are cleared out

To apply Function:

Click Actions button
Click Function
Select Function and Select Field(s) editor displayed
click on Select Function to scroll through list or begin typing desired function and select
click on Select Field(s) to add one or more fields to apply the function

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 12.26.32 PM.png

click on Apply
field(s) will be populated in the mapping schema

To perform Import:

Click Actions button
Click Import
Click on the green paperclip icon to browse to the desired file to import
Once file is selected, click Open
Click Apply
Wait 60 seconds to ensure the process of pulling in mappings and labels is complete
Use the scroll, order and search options to locate the cust_fields and cust_metrics fields to ensure all the report suite custom fields have been mapped

To perform Export:

Click Actions button
Click Export
syntasa_mapping_export.csv will be created and downloaded for the user

Filters

Filters provides the user the ability to filter the dataset (apply a Where clause) if required.

To create a filter:

click the Apply Where Clause button to enable filter editing

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 12.27.41 PM.png

filter editor screen will appear
select the appropriate Left Value from the drop-down list or click --Function Editor-- to create and apply custom function

Help Desk > v4 - Featurize > image2018-11-5_11-22-5.png

select the appropriate Operator from the drop-down list
select the desired Right Value for filter from the drop-down list or click --Function Editor-- to create and apply custom function
multiple filters can be applied
ensure the proper (AND/OR) logic is applied when adding additional filtering if required

Outputs

The Outputs tab provides the ability to name table and displayed name on the graph canvas, along with selecting whether to load to Big Query (BQ) if in the the Google Cloud Platform (GCP), load to Redshift or RDS if in Amazon Web Services (AWS), or simply write to HDFS if an using on-premise Hadoop.

Help Desk > v4 - Featurize > Screen Shot 2018-11-13 at 12.31.25 PM.png

Expected Output

The expected output of the Featurize process are the below tables within the environment the data is processed (e.g. AWS, GCP, on-premise Hadoop):

output_table <fieldname configurable> - table using Syntasa defined column names

This table can be queried directly using an enterprise provided query engine.

Additionally, the table can serve as the foundation for building other datasets, such as Syntasa custom built datasets.

{[{category.name}]}