Creating a New Runtime
- Click to open the Runtime screen.
- Click the plus icon (
) to add a new runtime.
- A "Create Runtime Template" screen will appear.
- Provide a unique "Name" for the runtime.
- Select the "Runtime Type" from the drop-down list.
- Based on the selections, provide the required details (see below).
- Click "Save".
The new runtime configuration will appear on the Runtime screen and will be available for use within an app.
Configurations
When using the Spark Runtime Type:
Name - Name of the runtime, which will be available in the drop-down menu when executing a job, e.g. "Adobe_Daily_Job".
Copy Runtime - This allows the user to copy settings from the other existing runtimes. Ex: Let's say one of your jobs needs a bigger cluster than the default existing runtime. Then the easy way to create runtime for the requirement is to copy the default runtime and update the number of nodes and save.
Runtime Type - Runtime that executes a spark job in the cluster. Spark should be selected for the majority of runtimes.
Default Development - This denotes the default runtime which will be automatically selected for a development job.
Default Production - This denotes the default runtime which will be automatically selected for a production job.
Cluster Max Up Time - This the setting where you can control the cluster max up-time, the cluster gets terminate after when it reaches the cluster max up-time. The default value is 12 hours.
Cluster Max Up Time Unit - This is the time unit for cluster max up-time defined in the previous field.
Cluster Base Name - Prefix of the cluster name that will show in DataProc/EMR console, e.g. "Syntasa-cluster-XXXX-XXXX".
Cluster Release Label - This sets the version of the DataProc cluster the system will create. Generally, select the highest version number if it is not already selected.
Terminate on Completion - This sets the cluster to terminate after the job gets finished.
Use Private IP -
Zeppelin - This enables the zeppelin notebook attached to the cluster.
Jupyter Notebook - This enables the Jupyter notebook attached to the cluster.
Master Instance Type - Machine type of master node
Worker Instance Type - Machine type of worker node(s)
HDD in GB - Specify hard drive size for all nodes in the cluster. Default is 500 GB
Worker Instance Count - Number of worker nodes included in cluster creation
Task Instance Count - Number of task workers included in the cluster. Normally set to 0 as these machines are cheaper to run, but may be pulled out of a cluster based on demand.
Idle Time Deletion Interval in Seconds - How long to wait before automatically deleting a cluster.
Cluster Mode:
- Standard (default) - One master node plus a configurable number of worker nodes.
- High Availability - Three master nodes and a configurable number of worker nodes. This configuration is to account for the potential failure of a master single master node resulting in the failure of an entire job.
- Single - One node cluster that acts as a master and worker node.
Deploy Mode:
- Client (default) - Control where the driver program is running. In our case, on the master node.
- Cluster - The cluster controller decides where the driver program will run, based on available resources across the cluster.
Config Setting -
Add a number of the default configurations settings:
Config Setting | Config Value | Meaning |
spark.executor.memory | 4G | Amount of memory to use per executor process |
spark.driver.memory | 5G | Amount of memory to use for the driver process, i.e. where SparkContext is initialized |
spark.executor.cores | 1 | The number of cores to use on each executor. I |
spark.sql.autoBroadcastJoinThreshold | 1000000000 | Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. |
Expected Output
The expected behavior is that the new runtime will appear on the Runtime screen and it will be available as a separate runtime when processing an app job.
A separate runtime can be utilized for each step or job of a specified workflow in an app depending on the requirements of processing.