Init Scripts & Dependencies – SYNTASA™

There are three places to put setup that runs before your notebook does anything: a global script that the platform runs everywhere, a runtime script attached to a specific runtime template, and a per-notebook script that you write yourself. Each layer can build on, or override, the one before it. This article explains the three layers, how they execute, where to write each one, and how to declare Python and JAR dependencies that the platform installs for you.

The three layers

Global init script

Set by an admin in Platform Settings → Notebook Config → Script. Runs for every notebook in every workspace. The right place for setup that everyone in the organization needs — system certificates, base environment variables, shared utility installs.

Runtime init script

Set by an admin on a runtime template, in the template's image settings. Runs only when that runtime template is attached. The right place for setup that's specific to the template — libraries the template depends on, connector configurations, anything that only makes sense once that template is attached. See Runtime Attachment in JupyterLab for runtime templates and how attachment works.

Notebook init script (new in 9.1)

Set by you, per notebook, from a toolbar button. Runs only for this notebook. The right place for setup that only this notebook needs — a one-off package, an env var your notebook references, or anything you want to override from the layers above. The script is stored inside the notebook's .ipynb metadata, so it travels with the file: copy the notebook to another workspace and the script comes with it.

Order of execution

Your scripts run in the order Global → Runtime → Notebook. The notebook script runs last, so it can extend or override anything set up by the global or runtime scripts above it. The same scripts, in the same order, also run on every Spark executor pod the kernel creates — you don't need to think about driver vs executor as separate things.

The three init script layers and the order they execute in.

Editing your notebook's init script

Every notebook gets an init script editor on its toolbar. Click the icon (a terminal-prompt glyph next to the Spark UI button) and a modal opens with a plain bash text area. Edit the script, click Save, and the kernel restarts so the new script takes effect. To remove the script entirely, clear the text area and Save — an empty script is treated as no script.

The init script toolbar button and the modal editor — a plain bash text area with Save and Cancel.

Concurrent edits

If a workspace member edits the script while your modal is open, you'll see a dialog on Save: "This script was modified by another user since you opened it. Overwrite their changes?" Click Cancel to back out and re-open the editor (you'll see their version), or Overwrite to replace their version with yours.

Upgrading from 9.0.x — notebook card init scripts auto-migrate

In 9.0.x, notebook card init scripts were stored in syn-app card metadata in the platform database. The first time you open a 9.1 workspace, any existing card scripts are automatically migrated into the .ipynb metadata so the new toolbar editor can read them. Nothing for you to do — the script is right there when you open the editor.

What to put in an init script

All three init scripts are bash. You can do anything shell-level: set environment variables, download files, install system certificates, configure files in well-known locations, or shell out to package managers. The script's output goes to a log file — see init Script Logs & Troubleshooting for log paths and how to inspect them.

Tip — drop custom JARs straight into Spark's classpath

Anything you wget into $SPARK_HOME/jars/ is on the Spark classpath automatically — no Spark config needed, no other declaration required. This works for both Python and Scala kernels.

wget -q -O $SPARK_HOME/jars/myjar.jar "https://repo1.maven.org/maven2/.../myjar.jar"

Note: a failed wget (404, network error) does not fail the kernel boot — the JAR is just silently missing. Add set -e at the top of your init script if you want a hard failure on download errors, or check the script's exit codes.

Declaring dependencies — Python packages

Alternative to init scripts, to install Python packages on the driver and on every Spark executor, the platform has built-in dependency properties that handle the install and distribution for you. Set them as Spark config properties in either the global config or a runtime template's config. (Already in a session and just need a package installed without restarting? See synutils.lib.installPyPI in Notebook Utilities (synutils) instead.)

Online — install from PyPI

Set both properties:
Spark config
syntasa.python.enable.dependencies        = true
syntasa.python.dependencies.names.online  = humanize::tabulate::requests==2.31.0

Package names are delimited by ::. The platform pip-installs each package on the driver and on every executor.

Offline — install from your bucket

Upload your wheels, eggs, or zips to {bucket}/{configFolder}/deps/python/, then list the filenames:

Spark config
syntasa.python.enable.dependencies = true
syntasa.python.dependencies.names  = my_lib-1.0-py3-none-any.whl, helpers.zip

Declaring dependencies — JARs

Online — Maven coordinates

Spark config
syntasa.jar.enable.dependencies        = true
syntasa.jar.dependencies.names.online  = org.xerial:sqlite-jdbc:3.45.1.0

Maven JARs are downloaded from Maven Central to $SPARK_HOME/jars/ — automatically on the classpath.

Offline — JARs from your bucket

Upload JAR files to {bucket}/{configFolder}/deps/jars/, then list the filenames:

Spark config
syntasa.jar.enable.dependencies = true
syntasa.jar.dependencies.names  = my_connector.jar, custom_udf.jar

Standard Spark file configs

The platform also accepts the standard Spark properties for shipping files and JARs:

spark.jars — cloud paths are downloaded to $SPARK_HOME/jars/ on the driver and the property is stripped from SparkConf. The JAR is on the classpath automatically; no separate spark.jars entry is needed for it to work.
spark.jars.packages — Maven coordinates are downloaded to $SPARK_HOME/jars/ and the property is stripped.
spark.files and spark.pyFiles — cloud paths are passed through to Spark unchanged.

Properties stripped from SparkConf before the session is built

The platform reads several properties at bootstrap and then removes them from SparkConf before the Spark session is created — they will not appear in spark.sparkContext.getConf().getAll(). That's intentional and not a bug. The stripped set is: syntasa.python.*, syntasa.jar.*, spark.jars (cloud paths only — these are downloaded into $SPARK_HOME/jars/), and spark.jars.packages.

Where Spark configs live — global vs runtime template

The dependency properties above (and any Spark config in general) can be set in two places:

Global Spark config — set in Platform Settings. Written to spark-defaults.conf at kernel bootstrap. Picked up automatically by both Python and Scala kernels.
Runtime template Spark config — set on a runtime template. Merged into SparkConf at Spark session creation, and overrides the global value when the same property is set in both.

From your notebook's perspective, you just see the merged result of these two layers — the runtime's value if it's set, otherwise the global value.

Platform Settings → Notebook Config → Script — the admin surface for the global init script.

Runtime template configuration showing the runtime init script field and the Spark config section.

Examples

Example 1 — Install Python packages from PyPI

Spark config
syntasa.python.enable.dependencies        = true
syntasa.python.dependencies.names.online  = humanize::tabulate

Both driver and every executor will pip install humanize and tabulate.

Example 2 — Install an offline Python wheel

Upload your wheel to s3://your-bucket/syn-cluster-config/deps/python/my_lib-1.0-py3-none-any.whl, then set:

Spark config
syntasa.python.enable.dependencies = true
syntasa.python.dependencies.names  = my_lib-1.0-py3-none-any.whl

Example 3 — Pull a Maven JAR onto the classpath

Spark config
syntasa.jar.enable.dependencies        = true
syntasa.jar.dependencies.names.online  = org.xerial:sqlite-jdbc:3.45.1.0

Example 4 — Init script that sets an env var and downloads a JAR

Set as a global, runtime, or notebook init script:
#!/bin/bash
export DATABASE_URL="jdbc:postgresql://db.internal:5432/analytics"

wget -q -O $SPARK_HOME/jars/jfiglet-0.0.9.jar \
  "https://repo1.maven.org/maven2/com/github/lalyos/jfiglet/0.0.9/jfiglet-0.0.9.jar"

This script runs on every kernel and every executor pod. The export propagates to the Spark executor processes via the executor bootstrap, so cells that read DATABASE_URL inside Spark UDFs get the value too.

Example 5 — Notebook init script that overrides a runtime setting

Set this as the per-notebook init script (toolbar editor):

bash
#!/bin/bash
# The runtime template sets DATABASE_URL to staging. Override for this
# notebook only — it talks to the production replica.
export DATABASE_URL="jdbc:postgresql://db-prod-readonly.internal:5432/analytics"

Because the notebook layer runs last, this overrides the runtime's value for this notebook only — other notebooks attached to the same runtime keep the staging URL.

Logs and troubleshooting

Each script writes its output to a separate log file under /logs/ on the kernel pod. When something doesn't work — a package didn't install, a JAR isn't on the classpath, an env var isn't set — the log file for the offending layer is the first place to look. See Init Script Logs & Troubleshooting for the full list of log paths and the common failure modes.

{[{category.name}]}