Now that the Syntasa software is installed, there's a bit of setup needed before you can take advantage of the platform. The high-level steps of system configuration are described in the article System Configuration Guide.
This article dives into the details of the first of those steps, Infrastructure, if your Syntasa environment has been installed within Google Cloud Platform (GCP):
- Locating prerequisite information
- Identifying and entering Infrastructure information for the following sections:
- Final review of Infrastructure information
Locating prerequisite information
You will need access to the Google Cloud Console with admin/owner level permissions. We suggest opening Google Cloud Console in one tab and the Syntasa Infrastructure page in another. We will be finding the information in Google Cloud Console and inputting it into the Syntasa Infrastructure screen for the rest of this guide.
The Infrastructure screen is available from the settings menu gear icon (). Only users with the role System Admin have access to this menu.
Environment
The fields in the Environment section of the Infrastructure page should be filled out according to the settings within your GCP project. The below screenshot shows the values from the example, please follow the instructions for each field to apply the correct values for your environment.
Provider - Select GCP from the drop-down.
- Navigate to the project that Syntasa is installed via the drop-down selector at the very top bar of the Google Cloud Console.
- Using the hamburger menu on the top left, choose "Home".
- Find the project ID in the "Project Info" card on the top left of the "Home" page.
Region:
- While still in the "Home" screen, click on "SQL" in the "Resources" card below the "Project Info" card referenced above.
- This will take you to the SQL Instances page. Note the "location" column towards the right. It will look something like "us-east4-b".
- Using the example above, find "us-east4" in the Region dropdown.
Zone - Referencing the step above and the example above, find the zone "us-east4-b" in the Zone dropdown.
Runtime Max Up Time - This configuration is a cost-control measure. A cluster will be automatically deleted after this threshold has been passed, even if a job is still running. We recommend setting this to 12 hours.
Runtime Max Up Time Unit - We recommend setting the time unit to Hours.
Network
In this section, we tell Syntasa about the VPC settings in your GCP project. The below screenshot shows the values from the example, please follow the instructions for each field to apply the correct values for your environment.
Network Name - Leave blank.
Sub Network Name:
- Navigate to the project that Syntasa is installed into via the drop-down selector at the very top bar of the Google Cloud Console
- Using the hamburger menu on the top left, choose "VPC network" and then choose the submenu "VPC networks".
- If there are multiple VPC networks configured and one has a name with "dataproc" contained in it, choose that one and add it to the Infrastructure screen.
- The Syntasa Infrastructure is looking for the name of the VPC subnetwork.
Storage
The fields in the Storage section of the Infrastructure page should be filled out according to the settings within your GCP project. The below screenshot shows the values from the example, please follow the instructions for each field to apply the correct values for your environment.
Bucket
- Navigate to the project that Syntasa is installed via the drop-down selector at the very top bar of the Google Cloud Console.
- Using the hamburger menu on the top left, choose "Storage" and then choose the submenu "Browser".
- Create a top-level bucket for this project using the "CREATE BUCKET" button at the top of the page. The name can be anything, but we recommend the name being short in length and close to the name of the GCP project.
- Add this name to this field in the Infrastructure screen.
Config Folder - This can be set to any valid GCP sub bucket, but we recommend:
syn-cluster-config
Logs Folder - This can be set to any valid GCP sub bucket, but we recommend:
syn-cluster-logs
Metastore
This section refers to configuration options for the Apache Hive Metastore we will use for this Syntasa installation. The below screenshot shows the values from the example, please follow the instructions for each field to apply the correct values for your environment.
File Format - We suggest picking "Parquet" from the drop-down menu.
Metastore Type - Unless otherwise instructed by the support staff, pick "Cloud SQL - Postgres" from the drop-down menu.
Metastore Host Name:
- Navigate to the project that Syntasa is installed into via the drop-down selector at the very top bar of the Google Cloud Console
- Using the hamburger menu on the top left, choose "SQL". The Metastore Host Name is the "Instance ID" now shown on the screen.
Metastore Password:
This password was created when the SQL cloud instance was created and is the password of user "postgres". If you lost this password, you can reset it in:
- Navigate to the project that Syntasa is installed into via the drop-down selector at the very top bar of the Google Cloud Console
- Using the hamburger menu on the top left, choose "SQL". Click on the Instance ID to enter and view the settings of this Cloud SQL instance.
- Click on "Users" on the left pane.
- Click on the three-dot menu next to the user "postgres", choose "change password" and follow the prompts to reset the password.
- Enter this as the Metastore Password in the Syntasa Infrastructure screen.
Metastore DB Name - This will be the name of the hive database that the system will create. This can be any name, but we recommend:
hive_metastore
Security
Use Instance Profile - If this drop-down is set to True, it means we will use the credentials associated with the machine that Syntasa is installed and running on. If this is set to False, we have a dialog that lets us upload a JSON credential file, which is associated to a service account that has been granted permissions to use the various GCP services we leverage. The default setting here is True.
Big Query
The fields in the Big Query section of the Infrastructure page should be filled out according to the settings within your GCP project. The below screenshot shows the values from the example, please follow the instructions for each field to apply the correct values for your environment.
Project Id - For a standard installation, please set this field to the same Project ID that was set above, explained here.
Dataset Location - Please set this to coincide with your preferred region, US or EU.
Use Instance Profile - If this drop-down is set to True, it means we will use the credentials associated with the machine that Syntasa is installed and running on. If this is set to False, we have a dialog that lets us upload a JSON credential file, which is associated to a service account that has been granted permissions to use the various GCP services we leverage. The default setting here is True.
Big Table
If you plan on using Google Big Table as part of a Syntasa app, please configure this section, but it is optional.
Project Id - For a standard installation, please set this field to the same Project ID that was set above, explained here.
Instance Id:
A Big Table instance needs to be configured prior to configuring the Syntasa Infrastructure screen.
- Navigate to the project that Syntasa is installed into via the drop-down selector at the very top bar of the Google Cloud Console
- Using the hamburger menu on the top left, choose "Bigtable". Click on the Instance ID to enter and view the settings of this Cloud SQL instance.
- Enter the Bigtable Instance ID found on this screen into the Syntasa Infrastructure screen.
Use Instance Profile - If this drop-down is set to True, it means we will use the credentials associated with the machine that Syntasa is installed and running on. If this is set to False, we have a dialog that lets us upload a JSON credential file, which is associated to a service account that has been granted permissions to use the various GCP services we leverage. The default setting here is True.
Final review of Infrastructure information
Final Syntasa Infrastructure screen for a GCP installation: