The following is a general understanding of the resources and services required to install the Syntasa platform on-premise, but these will vary based on your data volume storage and processing needs.
Application Server: One (1) dedicated server with 16 cores, 128GB RAM, and a 500GB hard drive is required to run the Syntasa application. This server must be dedicated to Syntasa 24x7.
Hadoop Cluster: Servers with 16 cores, 64GB RAM, and 2TB of hard drive space are desired. Total machines depend on the number of concurrent users and the required processing load. A good start would be ten (10) servers, this should allow for getting the initial datasets configured and users up and running. The cluster can scale up as data processing volumes increase and/or the user base increases.
A five (5) server cluster will be dedicated to running the visualization engine and the other five (5) servers will be dedicated to data processing.
Services: The following services are required to run Syntasa on-premises:
- Spark 3
- Yarn
- Hadoop
- Hive
- Presto, Impala, and/or Apache Hive Low Latency Analytical Processing (LLAP)