As noted in the Syntasa Quick Start Guide, in order for the Syntasa platform to be installed and able to process data in your cloud there are a few steps that need to be completed: build the foundation within your virtual private cloud (VPC), install the Syntasa software on that foundation, and finally connect the software to the foundation.
This article details the first of those steps, Preparing the cloud environment, specifically for GCP:
- GCP project setup
- Connectivity settings
- Services and user accounts
- Compute and storage setup
- Options for installing the Syntasa software
GCP project setup
A new, dedicated GCP project should be set up to contain and support the installation and data processing that will occur with the Syntasa software. A suggested name for this project is "syntasa-app-project", but this can be named differently if your organization has specific naming conventions.
Connectivity settings
Within the newly created project, a number of settings related to connectivity need to be created and configured. Details of each setting and rule can be provided by the Syntasa services team.
- VPC and subnets - A VPC needs to be created within the new project, which can be named as per your organization's name conventions. Two subnets with the following names also need to be created: syntasa-gke-subnet and syntasa-dataproc-subnet.
- Firewall rules - A number of firewall rules need to be added to allow Syntasa to access both subnets as well as the ability to access the Syntasa application node.
- External IP addresses - A static IP address needs to be reserved for use by the Syntasa application node.
Services and user accounts
Within the newly created project, a number of services within the GCP project APIs & Services page need to be enabled. The full list of required and optional selections will be provided by the Syntasa services team, below are the key services:
- Google Compute
- Google BigQuery
- Google Cloud SQL
- Google Kubernetes Engine
- Google Cloud Storage
- Google Network Services
- Google Dataproc
Also, two users/accounts need to be established for the Syntasa software:
- IAM User - This is needed to execute the installation of the software. If a user already exists or if you are using a project owner/editor user then this step can be skipped.
- Service Account – This service account named “syntasa-application-sa” will be used to create the Kubernetes cluster and also used by the Syntasa application to create Dataproc clusters as well as access other services. The detailed number of roles needed will be provided by the Syntasa services team.
Compute and storage setup
A number of computing resources and storage buckets need to be set up within the new GCP project. These are used for the processing of data within the Syntasa data pipelines and the storage of the resulting data. Full details of settings and options will be provided by the Syntasa services team.
- Cloud SQL instance - This should be created as type Postgres 9.4+ and will be used as a metastore. The suggested name for this is "syntasa-pg-metastore", but can be varied per your organization's naming convention.
- Persistent volume - This if found via Compute Engine under the section Disks. This is suggested to be SSD, but "standard persistent disk" is will work as well. The suggested name for this is "syntasa-kubernetes-pd", but can be varied per your organization's naming convention.
- Storage bucket - The suggested name for the storage bucket is "syntasa-app", but this can be varied based on your organization's naming convention. The following folders need to be created within this bucket: syn-cluster-config, syn-cluster-data, syn-cluster-logs.
Options for installing the Syntasa software
As noted in the Installing the Syntasa platform section of the Syntasa Quick Start Guide, if performing the installation from GCP Marketplace then follow instructions there and the following can be skipped.
However, if the Syntasa services team is performing the software installation then the following resources are needed on a compute instance that will be used for the installation:
- GCP project information - The Syntasa services team will confirm a number of aspects and naming of the items setup from above.
- Hostname and SSL certificates - Hostname and SSL certificates, if set up, are needed. If you do not have an SSL certificate please create a self-signed certificate with a cert.pem and key.pem files ready to go. If you require assistance with this the Syntasa services team can assist.
-
Google CLI tools - The Google CLI tools installed on the machine and have configured it to use an administrator user’s credentials. To install this, please see the google links below:
- Google Cloud SDK Install - https://cloud.google.com/sdk/
- Google Cloud SDK Usage - https://cloud.google.com/sdk/docs/initializing
- Java - Please download either a JAVA 1.8+ JRE or JDK and setup on the machine.