The purpose of the Matomo Loader process is to get data from the Matomo website
Process Configuration
The Matomo Loader process includes three tabs providing the ability to define input configurations, map to the schema, and understand where the data is being written. Below are details of each screen and descriptions of each of the fields.
Input
This section provides the information Syntasa needs to understand the source connection path of the files and details of the files that need to be ingested.
Matomo Loader Process
Input
Fields To Flatten - Provide comma-separated field values. You can also use the Loader Event Configure table to view the data preview
Source Path - folder within the source connection the files reside. Do not include the bucket name or directory name specified in the Connection.
Source File Pattern - file name pattern of the raw files to pull specific source files from the connection, keeping in mind that the files may be .tar or .zip containing multiple files, such as the raw data and supporting enrichment data. This allows pulling specific files from a Connection where multiple source files may exist.
Date Parsing
This section defines the filename date structure that Syntasa uses to build partitions when processing the files.
- Date Pattern - defines the pattern of the date within the Adobe source filename. For Adobe, this date is typically in the yyyyMMdd or yyyy-MM-dd format.
- Date Extraction Type - specifies the method of extracting the date in a dropdown menu with options of 'Regex' and 'Index'. Regex is the recommended method because Adobe may deliver over 100 files in one day and the regex extraction type provides flexibility to pick up all files. Index provides a start position on the filename where the date begins and an end position where the date ends with the index starting at 0. For example, the filename of syntasademo_20180801.tar.gz would have a start index of 12 and the end index of 20.
- Regex Pattern defines where the file date exists on the filename. For example, syntasademo_(.*).tar.gz creates a group between the underscore and the first dot of the extension.
- Group Number - group number (text between parentheses) that Syntasa should use to locate file date. In the above example of a regex pattern, only one regex group is defined, so the group number should be set to 1.
Date Manipulation (Optional)
Rarely used for Adobe clickstream files and can usually be ignored for this process. This configuration is only made if the event data in the files is a different date than that of the filename.
- Days - number of days different, this can be a positive or negative value.
- Chronology - in most cases where this configuration is needed the selected option will be 'Days'.
Schema
The Schema screen provides Syntasa with the details of how the data is structured within the file, including how the fields are delimited and column names to help build the schema in Syntasa
Actions
For the Matomo process, there are three options available at this time: Import, Export, and autofill. Import is selected if the client has JSON data available to provide the custom mappings. Export is utilized to export the existing mapping schema in a .csv format that can be used to assist in the editing or manipulation of the schema. This updated file could then be used to input an updated schema into the dataset.
- Add Columns - to add a new column to the schema
- Clear - remove all fields
- Import
- Click Import
- Click on the green paperclip icon to browse to the desired file to import
- Once the file is selected, click Open
- Click Apply
- Wait 60 seconds to ensure the process of pulling in mappings and labels is complete
- Use the scroll, order, and search options to locate the cust_fields and cust_metrics fields to ensure all the report suite custom fields have been mapped
- Export
1. Click on the Export icon
2. syntasa_mapping_export.csv will be created and downloaded for the user - Auto Fill - For a specific date range, it will populate all the orders and names into the schema table
Outputs
The Outputs tab provides the ability to name tables and displayed names on the graph canvas, along with selecting whether to load to Big Query (BQ) if in the Google Cloud Platform (GCP) or Load to Redshift if deployed in AWS.