Available in Syntasa environments installed in AWS, the AWS EMR Cluster runtime type utilizes the cloud service to run jobs, execute code in notebooks, etc., and have all the various settings seen in the cloud service.
The basic runtime attributes required for all runtime types are detailed in Creating Runtime Templates; the settings available for this runtime type are detailed below.
Instance type and options
The AWS EMR Cluster runtime type enables several fields related to the master and worker instance types required for the runtime. The various instance types can be reviewed in AWS's Supported Instance Types article.
Here's a brief description of each field of the AWS EMR Cluster:
-
Cluster Base Name: The unique identifier for the AWS EMR cluster, aiding in easy identification and management.
-
Cluster Release Label: Specifies the version of EMR software to use, ensuring compatibility with specific features and enhancements. (6.6.0 selected by default)
-
Runtime Max Uptime: Sets the maximum duration the cluster can remain active before automatically shutting down. (12 by default)
-
Runtime Max Uptime Unit: Defines the time unit (mins, hours, days) for the 'Runtime Max Uptime' setting. (Hours by default)
-
Terminate on Completion (Toggle): When enabled, the cluster will automatically terminate once the job or notebook execution is complete.
-
Use Private IP (Toggle): Determines whether the cluster nodes communicate using private IP addresses for enhanced security.
-
Enable GPU (Toggle): Enables the use of GPU resources within the cluster for tasks that benefit from accelerated processing.
-
Enable Debugging: Activates debugging features to assist in diagnosing and resolving issues within the EMR cluster.
-
Enable Autoscale: Automatically adjusts the number of EC2 instances based on workload demand to optimize costs and performance.
-
Default Role Name: Specifies the IAM role that grants permissions to AWS services and resources accessed by the EMR cluster by default.
-
Instance Role Name: Identifies the IAM role assigned to EC2 instances within the EMR cluster, determining their permissions and access levels.
- Application: There are several applications that can be utilized with the AWS EMR Cluster runtime. Checking the associated checkbox will enable the feature(s) on new cluster instances of the runtime template.
-
Deploy Mode (client/cluster): Determines the deployment approach for the EMR cluster.
-
-
Client Mode: The driver operates on a client machine outside the cluster. This mode is preferred for debugging and troubleshooting scenarios.
-
Cluster Mode: The driver operates on a cluster node, maximizing resource efficiency. This mode is generally chosen for optimal performance and minimal runtime delays.
-
-
Configuration options
There are also Spark configurations available. Key settings related to the number of cores and memory are defaulted but can be adjusted as needed. Other values available for configuration are detailed in the Apache Spark documentation on Spark Configuration.