The Generic Event Enrich process applies functions to the data, joins lookups and writes the data into an event level dataset. The dataset is the foundation for building the session, product, and visitor datasets, but can also be used for analysis and user-defined analytics datasets.
Process Configuration
The Generic Event Enrich process includes four screens providing the ability to join multiple datasets, map to the schema, apply desired filters, and understand where the data is being written. Below are details of each screen and descriptions of each of the fields.
Click on the Generic Event Enrich node to access the editor.
General
Process Name - Provide a descriptive name to the process, ideally beginning with a verb.
Join
This section provides the information Syntasa needs if more than one set of data will be joined.
Joins
To create a join, click the green plus button.
- Join Type - left or inner join at this time
- Dataset selector - choose the dataset that will be joined with the first dataset
- Alias - type a table alias if a different name is desired or required
- Left Value - choose the field from the first dataset that will provide a link with the joined dataset
- Operator - select how the left value should be compared with the right value, for a join this will typically be a = sign
- Right Value - select the joining dataset value that is being compared with the left value
Mapping
This section is where the raw data schema is declared, user-defined or other labels are applied and then mapped into the Syntasa schema.
Syntasa has a growing set of custom functions that can be applied along with any Hive functions to perform data transformations.
It is recommended to consult Syntasa professional services with any questions before applying other than the default functions.
- Name - fixed Syntasa table column labels
- Label - customizable user-friendly names
- Function - raw file fields are mapped into the Syntasa columns using predefined or custom functions
Actions
For Generic Event Enrich there are two options available: Import and Export. Import is selected if the user wants to provide a custom mapping schema that they have created using an Excel .csv file. Export is utilized to export the existing mapping schema in a .csv format that can be used to assist in the editing or manipulation of the schema. This updated file could then be used to input an updated schema into the dataset.
To perform Import:
- Click Actions button
- Click Import
- Click on the green paperclip icon to browse to the desired file to import
- Once a file is selected, click Open
- Click Apply
- Wait 60 seconds to ensure the process of pulling in mappings and labels is complete
- Use the scroll, order, and search options to locate the Cust fields and Cust metrics fields to ensure all the report suite custom eVars, Props and Events have been mapped
To perform Export:
- Click Actions button
- Click Export
- syntasa_mapping_export.csv will be created and downloaded for the user
Filters
Filters provide the user with the ability to filter the dataset (apply a Where clause).
To create a filter:
- click the green plus button
- filter editor screen will appear
- ensure the proper (AND/OR) logic is applied
- select the appropriate Left Value from the drop-down list or click --Function Editor-- to create and apply a custom function
- select the appropriate Operator from the drop-down list
- select the desired Right Value for the filter from the drop-down list or click --Function Editor-- to create and apply a custom function
- multiple filters can be created and applied
Outputs
The Output section provides the ability to name the output table and how the output process should be labeled on the app graph.
Dataset
Table Name - This defines the name of the database table where the output data will be written. Please ensure that the table name is unique to all other tables within the defined event store, otherwise, data previously written by another process will get overwritten.
Display Name - The label of the process output icon is displayed on the app graph canvas.
Partition Scheme - Defines how the output table should be stored in a segmented fashion. Options are Daily, Hourly, and None. Daily is typically chosen.
File Format - Defines the format of the output file. Options are Avro, Orc, Parquet, and Textfile.
Field Delimiter - Only available if the file format is selected as a text file, this defines how the fields should be separated, e.g. \t for tab-delimited.
Load To BQ - This option is only relevant to Google Cloud Platform deployments. BQ stands for Big Query and this option allows for the ability to create a Big Query table. If using AWS, this will have the option to Load To RedShift, and if an on-premise installation data is normally written to HDFS and does not display a Load To option.
Location - This is automatically generated, and not editable, based on the paths and settings of the event store the app is created in, the key of the app, and the table name given above.
Expected Output
The expected output of the Generic Event Enrich process are the below tables within the environment the data is processed (e.g. AWS, GCP, on-premise Hadoop):
- tb_event - event level table using Syntasa-defined column names
- vw_event - view built off tb_event providing user-friendly labels