The Lookahead process creates the label dataset which contains the outcomes to predict. A label is a defined success event that the model learns from. It is possible to have more than one label for modeling.
Process Configuration
Let's cover each tab shown when you click on the Lookahead node to access the editor.
Join
At the top of the Join screen, it defines the dataset(s) to use as the input with the following two fields:
- Primary Source: The first dataset connected to the graph will appear by default. Click the down arrow to select a different dataset.
- Alias: Type a table alias if a different name is desired or required.
This section provides the information Syntasa needs if you are joining more than one set of data. Here are the steps to create a new join:
- Go to the App and navigate to Development >> Workflow.
- Click on the 'LookAhead' process.
- Select the 'Join' tab.
- Click the Plus (+) icon shown on the screen.
Following is the explanation of the configurable fields:
- Join Type: Choose between a left or inner join.
- Source: Select the dataset that will be joined with the first dataset.
- Alias: Type a table alias if a different name is desired or required.
- Left Value: Choose the field from the first dataset that will link with the joined dataset (e.g., customer ID if joining a CRM dataset).
- Operator: Select how the left value should be compared with the right value; for joins, this will typically be an equals sign (=).
- Right Value: Select the value from the joining dataset that is being compared with the left value.
Mapping
The Mapping screen defines the fields, allows the application of functions, and sets the identifiers and partitions.
This screen shows two configurable fields:
- Lookahead Window Length: This field determines the number of days of data to be included in the lookahead window. For example, in the above screenshot, the value is set to 7, which means the model will look at the next 7 days of data.
- Lookahead Lag: This field specifies a delay between the current day and the start of the lookahead window. For example, the value in the above screenshot is 1. So, with a lag of 1, the model will consider data starting from the day after the current day (i.e., if it's July 11th today, the lookahead window will consider data from July 12th).
In essence, the lookahead window with lag allows you to train a model on data from future periods, which can be helpful for tasks like forecasting. The lag helps account for delays in data processing or transmission. For instance, if it takes 2 days to receive data from an external source, you might set a lag of 2 days to ensure the model uses data that reflects reality.
For LookAhead, there are six actions available:
- Add - Add is used to select specific fields from the input table.
- Add All - Add All will select all fields from the input table.
- Clear - Clear will clear all selected fields from the mapping canvas.
- Function - The function is used to access the function editor to create custom fields.
- Import - Import is used if the client has JSON data available to provide the custom mappings. (Note: Wait 60 seconds to ensure the process of pulling in mappings and labels is complete.)
- Export - Export is utilized to export the existing mapping schema in a .csv format that can be used to assist in the editing or manipulation of the schema. This updated file could then be used to input an updated schema into the dataset.
Mapping Output
Here is the list of columns shown as mapping output on the screen:
- Order: Specifies the sequence or position of the column in the output.
- Name: The custom or specified name given to the column.
- Function: Defines how the data in the column is manipulated or processed, such as applying aggregation functions like max(), and sum(), or performing conditional operations using case statements.
- Identifier: This column serves as a key field for aggregation purposes.
- Partition: Indicates the column used to partition the data. To switch a field to Identifier or Partition, click in the corresponding cell and select the checkbox.
Filters
Filters provide the ability to filter the dataset (i.e., apply a WHERE and/or HAVING clause) to include only certain data.
Steps to create a filter:
- Toggle on the "Apply Where Filter" or "Apply Having Filter" to enable filter editing.
- The filter editor screen will appear.
- Select the appropriate Left Value from the drop-down list or click "--Function Editor--" to create and apply a custom function.
- Select the appropriate Operator from the drop-down list.
- Select the desired Right Value for the filter from the drop-down list or click "--Function Editor--" to create and apply a custom function.
- Multiple filters can be applied.
- Ensure the proper (AND/OR) logic is applied when adding additional filters if required.
Output
The Outputs tab offers the following capabilities:
- Naming the table and setting its displayed name on the graph canvas.
- Selecting the destination for data loading:
- Loading to BigQuery (BQ) if using Google Cloud Platform (GCP).
- Loading to Redshift or RDS if using Amazon Web Services (AWS).
- Writing to HDFS if using an on-premise Hadoop environment.
Expected Output
The expected output of the Lookahead process is the following table within the environment where the data is processed (e.g., AWS, GCP, on-premise Hadoop):
- Table Name: <tb_visitor_label> (default value) - table using Syntasa defined column names.
- Display Name: <Visitor Label> (default value) - display name of node on canvas.
This table can be queried directly using an enterprise-provided query engine.
Additionally, the table can serve as the foundation for building other processes within the Syntasa Composer environment.