The Lookahead process creates the label dataset which contains the outcomes to predict. A label is a defined success event that the model learns off of. It is possible to have more than one label for modeling.
This section provides some detail about setup of the Lookahead process.
The Lookahead Process Configuration screen has four tabs that define where the files reside, structure of the filename, structure of the schema, and storage rules of the files. Click on the Lookahead node to access the editor.
Below are details of each screen and descriptions of each of the fields.
The input screen defines the dataset(s) to use as the input.
- Primary Source - The first dataset connected on the graph will appear by default, click the down arrow to select a different dataset.
- Alias - type a table alias if a different name is desired or required.
To create a join, click the green plus button.
- Join Type - left or inner join
- Source - choose the dataset that will be joined with first dataset
- Alias - type a table alias if a different name is desired or required
- Left Value - choose the field from the first dataset that will provide a link with the joined dataset (i.e. customer ID if joining a CRM dataset)
- Operator - select how the left value should be compared with the right value, for joins this will typically be an = sign
- Right Value - select the joining dataset value that is being compared with the left value
The mapping screen defines the fields, allows to apply functions and defines the identifier(s) and partition(s).
- Lookahead Window Length - specifies number of days the process should look forward from processing date.
- Lookahead Lag - specifies number of days
For Lookahead there are six options available: Add, Add All, Clear, Function, Import and Export.
- Add - used to select specific fields from the input table.
- Add All - will select all fields from the input table.
- Clear - clear all selected fields from the mapping canvas.
- Function - used to access the function editor to create custom fields.
- Import - selected if the client has JSON data available to provide the custom mappings.
- Export - utilized to export the existing mapping schema in a .csv format that can be used to assist in the editing or manipulation of the schema. This updated file could then be used to input an updated schema into the dataset.
- Order - column ordering
- Name - specified name of the column
- Function - map a field directly or apply a function such as max(), sum() or a case statment to name just a few
- Identifier - field to aggregate on
- Partition - column to partition the data on
To switch a field to Identifier or Partition, click in the corresponding cell and select the checkbox.
Filters provides the ability to filter the dataset (aka apply a Where and/or Having clause) to include only certain data.
To create a filter click the green plus button and the filter editor screen will appear. Multiple filters can be applied, ensure the proper (AND/OR) logic is applied.
The Outputs tab provides the ability to name table and displayed name on the graph canvas, along with selecting whether to load to Big Query (BQ) if in the the Google Cloud Platform (GCP), load to Redshift or RDS if in Amazon Web Services (AWS), or simply write to HDFS if an using on-premise Hadoop.
The expected output of the Lookahead process is the below table within the environment the data is processed (e.g. AWS, GCP, on-premise Hadoop):
- Table Name <tb_visitor_label> default value - table using Syntasa defined column names
- Display Name <Visitor Label> default value - display name of node on canvas
This table can be queried directly using an enterprise provided query engine.
Additionally, the table can serve as the foundation for building other processes within the Syntasa Composer environment.
After the process is configured, it is highly recommended to test configured process.
- Click the down arrow to close the process configuration screen
- Save and Lock the canvas
- Shift-click on the process
- Click the Test button
- Run for one day using Overwrite mode (ensure the day being run exists in the input dataset)
- Click on Operations screen to track the job progress
- After a successful, test move on to the next process