Syntasa Notebook Initialization Scripts & Runtime Dependencies Guide – SYNTASA™

Syntasa notebooks support three mechanisms for customizing your Spark environment: init scripts, dependency properties, and standard Spark file configs. These work across all notebook flows and both Python and Scala kernels.

Notebook Flows

Interactive Notebook Workspaces

These use a shared Kubernetes cluster. The kernel pod (driver) starts when you open the notebook. No runtime template is required. Init scripts and dependencies from the global config run at kernel startup. Both Python and Scala kernels follow the same bootstrap flow.

Interactive Notebook Cards with External Cluster

These start as interactive notebooks, but the user attaches a dedicated runtime (Syntasa Runtime Template). When the runtime is attached, the runtime template's init script and dependencies run on the driver, then a new Spark session is created with executors on the dedicated cluster. Each executor runs global, notebook, and runtime init scripts during its own bootstrap. This works for both Kubernetes runtimes and EMR/Dataproc YARN clusters.

Notebook Process (Jobs)

These are scheduled or manually triggered notebook executions. The runtime template is specified at job submission time. Global init scripts and configs run during bootstrap. Runtime init scripts and dependencies run when the Spark session is created. Executors run all init scripts during their bootstrap.

Configuration Layers

Configuration is applied in two layers, with the runtime layer overriding the global layer when both set the same property. Global Configuration is set in Platform Settings and applies to all notebooks.
It includes a global init script (notebookConfig > script) and global Spark configs including dependency properties (runtimeConfig > sparkConfig).
Global Spark configs are written to spark-defaults.conf at bootstrap, so both Python and Scala kernels pick them up automatically.
Runtime Template Configuration is set per runtime template and applies when that runtime is attached or used for a job. It includes a runtime init script and runtime-specific Spark configs.
These configs can live at sparkCluster > sparkConfig > configs or container > image > sparkConfig > configs (image configs take precedence if both exist).
Runtime configs are merged at Spark session creation time, overriding global defaults.

Init Scripts

Init scripts are bash scripts that run on both the driver and every K8s executor pod. They are useful for setting environment variables, downloading custom JARs, installing system certificates, configuring files, or any other shell-level setup.
Global Init Script is configured at Platform Settings > Notebook Config > Script. Runs first on every kernel and executor pod. A corresponding global dependency script is auto-generated from global Spark config properties.
Notebook Init Script is configured per notebook card. Runs after the global init script.
Runtime Init Script is configured inside the runtime template under image settings. On the driver, this runs at Spark session creation time (not at bootstrap), because the runtime may be attached after the kernel started. On executors, it runs during bootstrap. A corresponding runtime dependency script is auto-generated from the runtime template's Spark config properties.
Init Script for Custom JARs — Users can download JARs directly in init scripts to $SPARK_HOME/jars/.
Any JAR placed in this directory is automatically on the Spark classpath for both Python and Scala kernels, with no additional configuration needed.
Example: wget -q -O $SPARK_HOME/jars/myjar.jar "https://repo1.maven.org/.../myjar.jar"

Supported Dependency Properties

Dependencies are configured as Spark config properties, either in global config or runtime template config.

Python Dependencies

syntasa.python.enable.dependencies must be set to true to enable. syntasa.python.dependencies.names.online accepts PyPI package names delimited by :: (example: humanize::tabulate::requests==2.31.0), installed via pip on both driver and every executor. syntasa.python.dependencies.names accepts filenames (wheels, eggs, zips) delimited by commas, uploaded to {bucket}/{configFolder}/deps/python/.

JAR Dependencies

syntasa.jar.enable.dependencies must be set to true to enable. syntasa.jar.dependencies.names accepts JAR filenames delimited by commas, uploaded to {bucket}/{configFolder}/deps/jars/, downloaded to $SPARK_HOME/jars/ (auto-classpath for both Python and Scala).
syntasa.jar.dependencies.names.online accepts Maven coordinates (example: org.xerial:sqlite-jdbc:3.45.1.0), downloaded from Maven Central to $SPARK_HOME/jars/.

Standard Spark File Configs

spark.jars cloud paths are downloaded to $SPARK_HOME/jars/ on the driver and the config is stripped from SparkConf (auto-classpath).spark.jars.packages Maven coordinates are downloaded to $SPARK_HOME/jars/ and stripped. spark.files and spark.pyFiles cloud paths are passed through to Spark natively.
All syntasa.* custom properties are stripped from SparkConf before session creation.

Configuration Examples

Example 1 — Online Python Packages

syntasa.python.enable.dependencies = true and syntasa.python.dependencies.names.online = humanize::tabulate. Both driver and all executors will pip install humanize and tabulate.

Example 2 — Offline Python Package

Upload wheel to s3://your-bucket/syn-cluster-config/deps/python/my_lib-1.0-py3-none-any.whl, then set syntasa.python.enable.dependencies = true and syntasa.python.dependencies.names = my_lib-1.0-py3-none-any.whl.

Example 3 — Maven JARs

syntasa.jar.enable.dependencies = true and syntasa.jar.dependencies.names.online = org.xerial:sqlite-jdbc:3.45.1.0.

Example 4 — Init Script with JAR Download + Env Vars

Set as global init script: #!/bin/bash then export DATABASE_URL="jdbc:postgresql://db.internal:5432/analytics" then wget -q -O $SPARK_HOME/jars/jfiglet-0.0.9.jar "https://repo1.maven.org/maven2/com/github/lalyos/jfiglet/0.0.9/jfiglet-0.0.9.jar". This runs on every driver and executor pod.

Execution Order

Driver Bootstrap (Pod Startup)

First, global init script runs. Second, global dependency script runs (downloads JARs to $SPARK_HOME/jars/, pip installs). Third, notebook init script runs. Fourth, global Spark configs written to spark-defaults.conf. Runtime init script does NOT run at bootstrap.

Driver — Spark Session Creation (Runtime Attach)

Runtime init script downloaded and executed. Runtime dependency script runs (JARs + pip). Runtime Spark configs merged into SparkConf (overrides globals). Spark session created. Same flow for both Python and Scala kernels.

K8s Executor Pods

Global init script runs. Global dependency script (pip installs only). Notebook init script runs. Runtime init script runs if KERNEL_TEMPLATE_ID available. Runtime dependency script (pip installs only). Ad-hoc packages from installPyPI/installCondaPackage installed. Executor JVM starts.

YARN Executors (EMR/Dataproc)

YARN executors run on cluster nodes, not K8s pods. They do not run bootstrap or init scripts. Python packages reach YARN executors via conda_pack. JARs from the driver are distributed by Spark natively.

Log Files

/logs/global_init_script.log for global init script output. /logs/global_init_configs.log for global dependency script output. /logs/notebook_init_script.log for notebook init script output. /logs/runtime_init_script.log for runtime init script output. /logs/runtime_init_configs.log for runtime dependency script output. /logs/syntasa_kernel.log for aggregated output from all scripts plus kernel logs.

Troubleshooting

Check if init scripts ran: Check for files at /tmp/global_init_script.sh, /tmp/notebook_init_script.sh, /tmp/runtime_init_script.sh, /tmp/global_init_configs.sh, /tmp/runtime_init_configs.sh.

Check logs: Read /logs/global_init_script.log, /logs/global_init_configs.log, /logs/runtime_init_script.log, /logs/runtime_init_configs.log.

Check dependency errors: Search logs for SYNTASA_DEP_ERROR.
Verify JAR on classpath: Python: spark._jvm.Class.forName("org.sqlite.JDBC"). Scala: Class.forName("org.sqlite.JDBC").

Verify Python package on executors: Use spark.sparkContext.parallelize(range(4), 4).mapPartitions(lambda _: [__import__("humanize").__version__]).collect().

Common Issues and Solutions

Global dependency script not found — Spark configs must be under runtimeConfig > sparkConfig in infrastructure settings, not under notebookConfig.

Runtime dependency script not found — Verify runtime template has Spark configs and was properly attached. Check /logs/syntasa_kernel.log.

Python package missing on executors — Check /logs/global_init_configs.log or /logs/runtime_init_configs.log. Verify internet access to PyPI.

JARs not on classpath — Check $SPARK_HOME/jars/ on the driver. Check dependency logs for SYNTASA_DEP_ERROR. You can also download JARs directly in an init script: wget -O $SPARK_HOME/jars/myjar.jar <url>.

spark.jars config seems ignored — Intentional. JARs are downloaded to $SPARK_HOME/jars/ and the config is stripped. No spark.jars entry needed.

Offline deps fail to download — Verify files at {bucket}/{configFolder}/deps/python/ or deps/jars/. Check IAM permissions. For high-side AWS, verify custom S3 endpoint.

Init script causes pod issues — Scripts are protected by safe_source_script (intercepts exit calls). Check the script log. Test in a minimal environment first.

{[{category.name}]}