The Register Identity process provides the user the ability to identify and register identity pairs with the Syntasa Identity Graph. An identity graph's contents include the identity pair, first time that pair was observed and whether that pair is suspect or not. The identity pair consists of a local_id
- This process expects only one input dataset
- Required information:
- Identity pair
- Event store
- Time when identity pair was observed
Process Configuration
The Register Identity Process has only two screens and very few configurations to make.
Parameters
This is where the two IDs are selected with a field that defines time. When an ID pair is located a timestamp will be assigned and output dataset will only have one reference to that ID pair with the very first instance that pair was seen.
Local Id - select an available field from the dropdown that is constructed based off input dataset. The Local Id is typically an anonymous ID such as a cookie field.
Universal Id - select an available field from the dropdown that is populated based off input dataset. The Universal Id is typically a static ID such as a customer ID that is unique to that known user (i.e. logged in).
First Seen Time - select an available field from the dropdown that is populated based off input dataset. This is a time field used to help determine when the first time the ID pairing was found in the data.
Output
This is where the user can define the dataset names and other environment specific options for data availability.
Datasets
Table Name - defines the name of the database table where the output data will be written. Please ensure that the table name is unique to all other tables within the defined Event Store, otherwise, data previously written by another process will get overwritten.
Display Name - label of the process output icon displayed on the app graph canvas.
Load To BQ - this option is only relevant to Google Cloud Platform deployments. BQ stands for Big Query and this option allows for the ability to create a Big Query table. If using AWS, this will have the option to Load To RedShift and if an on-premise installation data is normally written to HDFS and does not display a Load To option.
Compression - option to compress the files, reducing amount of storage required. Compression adds a bit of overhead to processing, but if raw files will be stored indefinitely, it is recommended to compress the files. If raw files will be removed after Event Enrich processing, then it is recommended to turn Compression off.
Event Store Name - name of the Event Store selected when initially creating the app. This option is not configurable, if any of the Event Store Name, Database or Location details are incorrect then back out of the app and make the changes in the Event Stores settings screen.
Database - name of the database in the event store that data will get written.
Location - storage bucket or HDFS location where source raw files will be stored for Syntasa Event Enrich process to use.
Expected Output
The data will be written to database tables that are accessible via a query engine. Typical expected behavior is a many-to-one relationship with Local ID and Universal ID where there can be many Local IDs to one Universal ID. For example, customer deletes cookies frequently and a new Visitor ID is created when visiting the site again after ID was deleted.
In cases where there are many Universal IDs to one Local ID, the ID pairings are flagged where Suspect field would be set to TRUE.
Fields that are available are:
- Local ID - value (i.e. cookie ID) selected on the parameters screen
- Universal ID - value (i.e. customer ID) selected on the parameters screen
- First Seen Time - value (i.e. Date/Time) selected on the parameters screen
- Suspect - TRUE/FALSE to indicate if there is a many-to-one relationship with a Local ID. Meaning there are multiple Universal IDs to one Local ID.
- Input Source - input dataset name
- Identity Partition - input dataset event_partition value when ID pairing was found
- Input Source Partition - input dataset name as a partition
ID Graph datasets are typically used in Unified Events apps to provide an ID stitching scenario where anonymous pages would be stamped with the Universal ID where a match of the Local ID exists. The ID Graph dataset can be used in other apps as an ID lookup as well.