The purpose of the Event Loader process is to ingest non-Adobe raw files, such as marketing data or enterprise lookup data.
Process Configuration
The Event Loader process includes three screens providing the ability to define the location and details of the files within the source connection, map to the schema, and understand where the data is being written. Below are details of each screen and descriptions of each of the fields.
Input
This section provides the information Syntasa needs to understand the source connection path of the files and details of the files that need to be ingested.
- Source Path - folder within the source connection the files reside. Do not include bucket name or directory name specified in the Connection.
- Source File Pattern - file name pattern of the raw files to pull specific source files from the connection, keeping in mind that the files may be .tar or .zip containing multiple files, such as the raw data and supporting enrichment data. This provides the ability to pull specific files from a Connection where multiple source files may exist.
- Event File Pattern - file name of separate raw events file, if exists, if not use same pattern as source file. For example, Adobe provides a .tar file named with report suite and date, inside that .tar file is the hit_data.tsv file. In this example, the user would enter hit_data.tsv in this field because it is the event file of the source file.
- File Type - Tar, Textfile, Zip file type
- Compression Type - specify if the file is compressed and compression type
- Incremental Load - provides option to keep previous files or overwrite in the case of one time lookup load
- File Name Has Date - specify if the filenames provide a date if the data source needs to be partitioned by date
- Date Pattern - pattern of the date in the filename (i.e. yyyy-MM-dd)
- Date Extraction Type - Regex or Index
- Regex Pattern - pattern of the date using regex extraction
- Group Number - group number of regex extraction string
- Date Manipulation - provides means to shift partition date by positive or negative number of days compared to file date (i.e. file date is 2018-08-01, but contents of file is 2018-07-31)
Schema
The Schema screen provides Syntasa the details of how the data is structured within the file, including how the fields are delimited and column names to help build the schema in Syntasa Event Enrich.
Outputs
The Outputs tab provides the ability to name tables and displayed names on the graph canvas, along with selecting whether to load to Big Query (BQ) if in the the Google Cloud Platform (GCP) or Load to Redshift if deployed in AWS.
Expected Output
The expected output of the Event Loader process are raw tables that can be queried if Load to BQ or Load to Redshift are selected. Data will be present in storage location (i.e. GCS bucket, S3 bucket) and Hive metastore created to facilitate data mapping within the Event Enrich process.