The GA Adaptor provides the ability to configure and import a file to the Google Analytics Data Import. Data Import provides a means to upload external data (i.e. CRM, product catalog, model scores) to GA providing metadata that helps to add context to the Analytics behavioral data.
GA Adaptor process has three main screens for configuration that defines the input dataset, sets the parameters for constructing the data import file and upload requirements, and the output settings for handling the file.
- Input Selection: choose the input data source
Key Field: Syntasa identifier field (i.e. GA Client ID). This is only one field.
Key Dimension: Dimension in GA that matches Key Field and is integrated with the GA tracking code (i.e. Client ID). GA custom dimensions are in the form of ga:dimensionname (i.e. ga:dimension1). This is only one dimension.
Data Field: comma separated string of values from the input table (i.e. hair,eyes,car)
Data Dimension: comma separated string of values for existing GA dimensions aligning with data in the Data Field (i.e. ga:dimension2,ga:dimension3,ga:dimension4) For demonstration purposes GA dimensions 2-4 were set to hair, eyes, car.
GA View ID: a Data Import in GA must be created first, in that Data Import a View is associated with the data import. The same view defined in that record is used here.
GA Account ID: Account ID of the GA implementation (https://support.google.com/analytics/answer/1008080?hl=en)
GA Web Property ID: Web Property of the GA implementation (https://support.google.com/analytics/answer/7372977?hl=en)
Custom Data Source ID: This is the Data Import - Data Set ID (https://support.google.com/analytics/answer/4524584?hl=en)
GCP Bucket Name: name of GCS bucket the service account file resides
Service Account File Path: path along with full file name of the service account JSON file that gets created upon service account creation in GCP
Partition Column Name: Input data source column name, typically it will be event_partition, if the input source is coming from a From BQ process then it may be file_date. Custom created analytics datasets may have partition column different than the usual. Please inspect input data source first.
api/file/both: Select if you want the process to run for API only, file only, or both. Type in lower case one of the following:
Output Bucket: if file or both is chosen, this is the name of the GCS bucket the file csv file gets written to, with no leading or trailing slashes
Output file path: folder the file should get written to
File Retention: number of days the file should persist in GCS for being automatically purged
Bucket - Config JSON: All of the above fields can be configured in JSON file that the process will pick up and use. This is the bucket where the JSON file resides with no leading or trailing slashes. (i.e. syntasa-gcp-project)
File Path - Config JSON: folder and name of config file without any leading or trailing slashes (i.e. audiences/config/paramas.json)
- API Output
- Table Name: name of api output table
- Display Name: name displayed on output node, suggest naming the same as the Table Name
- Partition Scheme: scheme used to partition the data (i.e. daily or hourly). Most cases this is set to daily
- File Format: format of the files stored in GCS, Parquet is used in most cases
- Load To BQ: toggle on to load the data into a BQ table based on the chosen event store
- Table Name: ga upload table name
- Display Name: display name of the output process node, suggest naming the same as the Table Name
- Provide a descriptive name to the process, ideally beginning with a verb (i.e. Import GA data)
The expected output of the GA Adaptor 5000 process is a comma-separated values (csv) file with a header row that is ready for upload to GA. Based off the configurations of the adapter the file may get sent to GA into the respective Data Import record via the API or the file may be stored in GCS for manual upload, or a combination of the two.
Along with a file that is ready for upload, the data will be written to a partitioned table allowing for ability to run further analysis of data produced by the process for debugging or general data analysis purposes.