As noted in the Syntasa Quick Start Guide, in order for the Syntasa platform to be installed and able to process data in your cloud there are a few steps that need to be completed: build the foundation within your virtual private cloud (VPC), install the Syntasa software on that foundation, and finally connect the software to the foundation.
This article details the first of those steps, Preparing the cloud environment, specifically for Azure:
Azure cloud setup
The following instructions require that a billing/subscription account is set up within the Azure environment. Also, the user performing the required preinstall steps should have the needed permissions to install and setup the following Azure Cloud Services:
- Resource Groups
- Virtual Networks | Subnets | Route Tables
- Network Security Groups
- Virtual Machines | Virtual Disks
- SQL Database
- Blob Storage
- Managed Identities
- HDInsight
It is recommended to create a custom Resource Group in Azure to manage all the requisite pieces within a single group. This will allow easier management of all the services used by Syntasa as well as managing the billing charges.
For any questions with setting up billing accounts or subscriptions, the Syntasa services team can assist.
Connectivity settings
From the Microsoft Azure Portal, navigate to Virtual Networks to create a new virtual network. On the first tab, Basics, the following information needs to be filled out:
- Subscription - Select the name of the Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Name - Provide a descriptive name for the virtual network, e.g. “Syntasa-Virtual-Network”.
- Region - Select the region that is to be used for this resource.
Continue to the following section to configure the IP addresses and subnets:
- IPv4 Address - Provide an IP address CIDR that is to be used. If your organization has requirements on CIDR ranges that can be used, please contact your DevOps/IT team for more assistance. Otherwise, it is suggested to use “10.0.0.0/16”.
- Compute Subnet - Under the subnets section, if a default subnet exists please click on the checkbox next to it and remove that subnet. Add a compute subnet by clicking the “+Add Subnet” button and complete the following information:
- Subnet Name - Syntasa-Compute-Subnet
- Subnet Address Range - 10.0.10.0/24
- Service Endpoints - Please leave this section blank, i.e. do not select any services.
- Services Subnet - Add a services subnet by clicking the “+Add Subnet” button and complete the following information:
- Subnet Name: Syntasa-Services-Subnet
- Subnet Address Range: 10.0.11.0/24
- Service Endpoints - Please leave this section blank, i.e. do not select any services.
Under the Security section please keep the default values:
- BastionHost- Disabled
- DDoS Protection - Disabled
- Firewall - Disabled
Lastly, under the Tags section add and tags, i.e. key/value pairs, if needed. Once complete click "Review + Create" to create the virtual network.
Virtual machine setup
The Syntasa software will be installed within a virtual machine. The following section reviews the needed configuration setup for the required connectivity for the virtual machine as well as the settings for the virtual machine.
Network security group
A network security group needs to be created to allow the virtual machine access to HDInsight and for Syntasa application user administration. Please contact your DevOps/IT team if there are specific IP addresses that need to be whitelisted for access from your office or VPN.
From the Microsoft Azure Portal, navigate to Network Security Groups to create a new security group. Under the Basics section please fill in the following items:
- Subscription - Select the name of your Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Instance Name - Provide a descriptive name for the security group, e.g. “Syntasa-App-Security-Group”.
- Region - Please select the same region as selected for the virtual network in the connectivity settings section above.
- Tags - Please add any tags as needed.
Next, under the Tags section add and tags, if needed. Once complete click "Review + Create" to create the security group.
Once created, navigate back to the newly created security group and add new Inbound Security Rules with the following rules:
Priority |
Name |
Port |
Protocol |
Source |
Destination |
Action |
100 |
App-Node-Access |
443 |
Any |
<Your Office/VPN/IP> |
VirtualNetwork |
Allow |
300 |
HDI_Health_Checks |
443 |
Any |
168.61.49.99/32, 23.99.5.239/32, 168.61.48.131/32, 38.91.141.162/32, 13.82.225.233/32, 40.71.175.99/32 |
VirtualNetwork |
Allow |
310 |
HDI_DNSServices |
53 |
Any |
168.63.129.16/32 |
VirtualNetwork |
Allow |
65000 |
AllowVnetInBound |
Any |
Any |
VirtualNetwork |
VirtualNetwork |
Allow |
65001 |
AllowAzureLoadBalancer |
Any |
Any |
AzureLoadBalancer |
Any |
Allow |
65500 |
DenyAllInbound |
Any |
Any |
Any |
Any |
Deny |
Please note, some of the above rules might be set by default. If so, please leave them as-is and ignore them from the above table.
Virtual machine
From the Microsoft Azure Portal, navigate to Virtual Machines to create a new virtual machine. Under the Basics section please fill in the following items:
- Subscription - Select the name of your Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Virtual Machine Name - Provide a descriptive name for the virtual machine, e.g. “Syntasa Application Server”.
- Region - Please select the same region as selected for the virtual network in the connectivity settings section above.
- Availability Options - No Infrastructure Redundancy Required
- Image - Please select a Centos 7+ image.
- Azure Spot Instance - No
- Size - Please select the E8s_v3 type instance with 8 vCPUs and 64GiB.
- Authentication Type - Depending on your DevOps/IT team requirements, please select SSH or Password.
- Inbound Port Rules - Please select none for now. A network security group will be attached to this instance in the steps of the next section.
Continue to the next section to configure the disks with the following information:
- OS Disk Type - Premium SSD
- Encryption Type - Default Encryption at-rest with platform managed key
- Other options - Please leave other options as their default values.
Continue to the next section to configure the network settings with the following information:
- Virtual Network - Please select the virtual network created in the connectivity settings section above.
- Subnet - Please select the subnet “Syntasa-Compute-Subnet” created in the connectivity section above.
- Public IP - Please click the “Create New” button and fill in the following information:
- Name - Provide a descriptive name for the public IP, e.g. “Syntasa-App-Server-Static-IP”.
- SKU - Basic
- Assignment - Static
- NIC Network Security Group - Advanced
- Configure Network Security Group - Please select the security group created above, e.g. “Syntasa-App-Security-Group”.
- Accelerated Networking - Off
- Load Balancing - No
Continue to the next section, Management, to configure the following information:
- Diagnostics Storage Account - Please create a new storage bucket to store diagnostics logs if required.
- Other options - Please leave other options as their default values.
In the next section, Advanced, please leave other options as their default values. In the following section, Tags, please add any tags as needed. Finally, review and create the virtual machine.
Compute and storage setup
The Syntasa application requires Azure resources to process and store the results of the generated and transformed data. The following reviews the needed setup for these needed resources: SQL server and database, blob storage, and an HDInsight cluster.
SQL database
SQL database server
Before creating a metastore, we need to create a SQL Server Instance to house our new database. From the Microsoft Azure Portal, navigate to SQL Servers. Under the Basics section please fill in the following items:
- Subscription - Select the name of your Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Server Name - Please enter a server name, e.g. “Syntasa-SQL-Server”.
- Location - Please select the same region as selected for the virtual network in the connectivity settings section above.
- Server Admin Login - Please select a username, e.g. “admin”.
- Password - Please select a strong password. This should be securely stored.
Continue to the next section, Networking, to configure the following:
- Please check the box “Allow Azure services and resources to access this server”.
In the next section, Additional Settings, please leave other options as their default values. In the following section, Tags, please add any tags as needed. Finally, review and create the SQL server.
SQL metastore
From the Microsoft Azure Portal, navigate to SQL Databases. Under the Basics section please fill in the following items:
- Subscription - Select the name of your Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Database Name - Provide a descriptive name for the database, e.g. “SyntasaMetastore”.
- Server - Please select the SQL server created in the previous step.
- SQL Elastic Pool - No
- Compute+Storage - Please select a Gen5 2vCores, 32GB storage machine.
In the next sections please provided the desired settings or defaults as indicated below:
- Networking - Please leave other options as their default values.
- Additional Settings - If data backups are desired then please specify in this section. Otherwise, leave the fields as their default values.
- Tags - Please add any tags as needed.
Finally, review to ensure all settings are correct and then create the database.
Blob storage
Blob storage will be used to store the data processed by the Syntasa application and utilized by the HDInsight cluster that is created in the next section. The Syntasa application can use blob storage gen1 or Data Lake Storage. Also, an existing storage bucket can be used if desired, but if creating a new bucket please see the following steps.
From the Microsoft Azure Portal, navigate to Storage Accounts. Under the Basics section please fill in the following items:
- Subscription - Select the name of your Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Storage Account Name - Provide a descriptive name for the bucket, e.g. “Syntasa-Data-Bucket”.
- Location - Please select the same region as selected for the virtual network in the connectivity settings section above.
- Performance - Standard
- Account Kind - StorageV2
- Replication - Leave default value.
- Access Tier - Hot
Continue to the next section, Networking, to configure the following:
- Connectivity Method - Public Endpoint
- Network Routing - Microsoft Network Routing
In the next sections please provided the desired settings or defaults as indicated below:
- Data Protection - Unless there is additional guidance from your DevOps/IT team, please leave other options as their default values.
- Advanced - Please select Hierarchical Namespace as Enabled and leave all other fields as default.
- Tags - Please add any tags as needed.
Finally, review to ensure all settings are correct and then create the storage.
HDInsight cluster
Lastly, the compute resource needs to be created in order for the Syntasa application to run jobs and process data. Within Azure, the application relies on an HDInsight cluster, specifically an always-on variety as a normal HDInsight cluster takes up to an hour to provision.
From the Microsoft Azure Portal, navigate to HDInsight Clusters. Under the Basics section please fill in the following items:
- Subscription - Select the name of your Billing Subscription from the drop-down.
- Resource Group - Select the name of the Resource Group created for the Syntasa installation.
- Cluster Name - Provide a descriptive name for the cluster, e.g. “Syntasa-HDI-Cluster”.
- Region - Please select the same region as selected for the virtual network in the connectivity settings section above.
- Cluster Type - Please select a Spark Cluster, version 2.2 and above.
- Cluster Credentials - Please set usernames and passwords as needed.
- Use Cluster login password for SSH - Please select this option.
In the next sections please provided the desired settings or defaults as indicated below:
- Storage - Please select the Blob Storage configured above.
- External Metadata Stores - Please select the SQL Server and Metastore DB created above.
- Security + Networking:
- Virtual Network - Please select the virtual network created in the connectivity settings section above.
- Subnet - Please select the subnet created in the connectivity settings section above.
- Configuration+Pricing - Please select the Head Node and Worker Node Sizes as needed, along with the number of cores. Please contact the Syntasa DevOps/IT team if needing assistance.
- Tags - Please add any tags as needed.
Finally, review to ensure all settings are correct and then create the cluster.
The Syntasa Azure Infrastructure setup is complete. Please contact the Syntasa services team for any needed assistance.