synutils is the unified entry point in every notebook — a Python attribute and a Scala REPL alias. Each section below lists user-callable methods with examples in both languages.
connections — connection metadata + decrypted parameters
Method | Purpose |
|---|---|
| Full connection object (parameters auto-decrypted; encrypted fields wrapped in |
| Single parameter value as |
| Single parameter value as |
| All parameters — preserves |
| Drop cached responses |
Python
conn = synutils.connections.get("my_snowflake")
host = synutils.connections.getParam("my_snowflake", "host")
# Drop cached responses (force a re-fetch on next call)
synutils.connections.clearCache()Scala
val conn = synutils.connections.get("my_snowflake")
val host = synutils.connections.getParam("my_snowflake", "host")
// Drop cached responses (force a re-fetch on next call)
synutils.connections.clearCache()Redaction — encrypted fields render as **********
After decryption, encrypted parameter fields (passwords, API keys, secret keys, private keys — whichever fields apply per connection type) are wrapped in SecretString. They redact in print/log output but still work for SDK calls. Non-encrypted parameters (host, port, database, etc.) are returned as plain strings.
Python
SecretString is a str subclass; it redacts when printed and reveals when you call .get() / .unseal() (or, if a SDK accepts a str subclass directly, you can pass it as-is).
conn = synutils.connections.get("my_snowflake")
print(conn["parameters"])
# {'host': 'snow.example.com',
# 'database': 'analytics',
# 'username': 'svc_account',
# 'password': '**********',
# 'privateKey': '**********'}
# Plain field — no wrapping
print(conn["parameters"]["host"]) # snow.example.com
# Encrypted field — redacts in print, reveals via .get()
pw = conn["parameters"]["password"]
print(pw) # **********
print(f"pw={pw}") # pw=**********
print(pw.get()) # actual password
# Pass to a SDK (works because SecretString is a str subclass — but
# many SDKs call str() internally, which returns "**********". For those,
# call .get() explicitly to be safe.)
import snowflake.connector
conn_handle = snowflake.connector.connect(
user=conn["parameters"]["username"],
password=conn["parameters"]["password"].get(), # <- unseal for SDK
account=conn["parameters"]["account"],
)Scala
SecretString has an implicit conversion to String, so JDBC and most SDKs work directly. Use .get() / .unseal() for explicit reveal.
val conn = synutils.connections.get("my_snowflake")
val params = conn("parameters").asInstanceOf[Map[String, Any]]
println(params)
// Map(host -> snow.example.com, database -> analytics,
// username -> svc_account, password -> **********, privateKey -> **********)
// Encrypted field — pattern-match for type-safe access
val pwd = params("password").asInstanceOf[SecretString]
println(pwd) // **********
println(pwd.get()) // actual password
// JDBC — implicit conversion fires automatically (no explicit unseal needed)
import java.sql.DriverManager
DriverManager.getConnection(jdbcUrl, "svc_account", pwd)
// String interpolation — type ascription forces the conversion
val safeUrl = s"jdbc:postgresql://host/${pwd: String}"
datasets — dataset registry + Hive table metadata
Method | Purpose |
|---|---|
| DataSet object (table name, partition cols, etc.) |
| Register a new dataset |
| All datasets under an event store ( |
DataSet object methods: tableName(), getPartitionColumns(), getNonPartitionColumns(), isPartitioned().
Python
# Fetch
ds = synutils.datasets.get("user_events")
print(ds.tableName(), ds.isPartitioned())
# Register a new dataset (default file format is PARQUET)
from synutils.file_format import FileFormat
synutils.datasets.create("my_new_dataset")
synutils.datasets.create("my_avro_dataset", fileFormat=FileFormat.AVRO)
# List all datasets registered under an event store
# (devDatasets + prodDatasets combined; filter by 'environment' for one env)
for d in synutils.datasets.list("TestStore"):
print(d["name"], d["environment"], d["database"])
# Only PRODUCTION datasets
prod = [d for d in synutils.datasets.list("TestStore") if d["environment"] == "PRODUCTION"]Scala
// Fetch
val ds = synutils.datasets.get("user_events")
println(s"${ds.tableName} ${ds.isPartitioned}")
// Register a new dataset (default file format is "parquet")
synutils.datasets.create("my_new_dataset")
synutils.datasets.create("my_avro_dataset", fileFormat = "avro")
// List all datasets registered under an event store (dev + prod combined)
synutils.datasets.list("TestStore").foreach { d =>
println(s"${d("name")} ${d("environment")} ${d("database")}")
}
// Only PRODUCTION datasets
val prod = synutils.datasets.list("TestStore")
.filter(_("environment") == "PRODUCTION")
infrastructure — platform infra metadata
The full method surface, side by side. Use the canonical method in new code. Legacy aliases exist only for backward compatibility with pre-integration notebooks.
Properties (read-only attributes)
Returns | Python | Scala | Legacy (Python) |
|---|---|---|---|
Cloud provider — |
|
|
|
Cloud region |
|
| — |
GCP project ID (empty for AWS/Azure) |
|
| — |
Default storage bucket |
|
| — |
Filesystem prefix ( |
|
| — |
Full storage path ( |
|
| — |
SSH connection type |
|
| — |
Metastore type ( |
|
| — |
Metastore hostname |
|
| — |
Section getters
Returns | Python | Scala | Legacy (Python) |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Full payload (all sections) |
|
|
|
Global init script ( |
|
|
|
Drop the cached payload |
|
| — |
Python returns
dicteverywhere; Scala returnsMap[String, AnyRef]. Same shape, language-native types.
Secret fields (encrypted + sealed)
Encrypted fields are auto-decrypted by the platform and wrapped in SecretString so they redact in print / log output. Pull a value out of the dict and call .get() / .unseal() to reveal the decrypted plaintext.
Section | Secret field names |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If decryption fails (no key, malformed ciphertext) the original ciphertext is preserved and a warning is logged — print(meta["metastorePassword"]) still redacts to **********, but .get() returns the raw ciphertext rather than the plaintext.
The seven subsections above are not exposed as individual section getters; access them via getAll() / asDict() or via the existing getMetastore() / getSecurity() for those two.
Python
print(synutils.infrastructure.providerType, synutils.infrastructure.bucket)
print(synutils.infrastructure.getStorage()) # full storage dict
sec = synutils.infrastructure.getSecurity()
print(sec) # {'accessKey': '**********', 'secretKey': '**********', ...}
sec["secretKey"].get() # decrypted plaintext
meta = synutils.infrastructure.getMetastore()
meta["metastorePassword"].get() # decrypted plaintext
payload = synutils.infrastructure.getAll()
payload["infraProvider"]["security"]["secretKey"].get() # decrypted
payload["infraProvider"]["metastore"]["metastorePassword"].get() # decrypted
payload["infraProvider"]["bigQuery"]["keyFile"].get() # decrypted JSON
payload["infraProvider"]["redshift"]["password"].get() # decrypted
payload["infraProvider"]["kafka"]["sslKey"].get() # decrypted
payload["infraProvider"]["gitConfig"]["personalAccessToken"].get() # decryptedScala
SecretStringis available as a bare name in notebook Scala sessions.
println(s"${synutils.infrastructure.providerType} ${synutils.infrastructure.bucket}")
println(synutils.infrastructure.getStorage()) // full storage Map
val sec = synutils.infrastructure.getSecurity()
println(sec) // values render as **********
// Map[String, AnyRef] preserves the seal — values aren't auto-unsealed.
// Cast to SecretString to access .get() / .unseal().
val secretKey = sec("secretKey").asInstanceOf[SecretString]
println(secretKey) // **********
println(secretKey.get()) // decrypted plaintext
val pwd = synutils.infrastructure.getMetastore()("metastorePassword")
.asInstanceOf[SecretString]
val raw: String = pwd // implicit conversion reveals (decrypted)
eventstores — event store path / database resolver
An event store is the primary data storage unit on the platform — it pairs a cloud storage path with a metastore database, with separate values per environment (development / production / snapshot).
Method | Purpose |
|---|---|
| Full event store object with resolved |
| Storage path only |
| Hive database name only |
| Set a default event store + environment so the |
| Properties that return the path / database / name of the default event store (requires |
| Drop the response cache |
lookupType selects how value is interpreted:
"name"—valueis the event store's logical name;envdecides which environment's path/database to return."database"—valueis a database name; the environment is auto-detected by matching againstdevelopmentDatabase/productionDatabasein the response (theenvargument is ignored).
Examples
Python
# 1. Lookup by name + environment
es = synutils.eventstores.get("click_stream", env="production")
print(es["path"], es["database"], es["_resolved_env"])
# gs://.../click_stream click_stream_prod production
# 2. Lookup by database — env auto-detected from the database value
es = synutils.eventstores.get("click_stream_dev", lookupType="database")
print(es["_resolved_env"]) # development
# 3. Just need the path or database
synutils.eventstores.getPath("click_stream", env="development")
synutils.eventstores.getDatabase("click_stream", env="production")
# 4. Configure defaults — then use path / database / name as bare properties
synutils.eventstores.configure(defaultName="click_stream", defaultEnv="development")
print(synutils.eventstores.path)
print(synutils.eventstores.database)
print(synutils.eventstores.name)
# 5. Switch the default environment without changing the name
synutils.eventstores.configure(defaultEnv="production")
print(synutils.eventstores.path)
# 6. Drop cached responses (force a re-fetch on next call)
synutils.eventstores.clearCache()Scala
// 1. Lookup by name + environment
val es = synutils.eventstores.get("click_stream", env = "production")
println(s"${es("path")} ${es("database")} ${es("_resolved_env")}")
// 2. Lookup by database — env auto-detected
val byDb = synutils.eventstores.get("click_stream_dev", lookupType = "database")
println(byDb("_resolved_env")) // development
// 3. Just need the path or database
synutils.eventstores.getPath("click_stream", env = "development")
synutils.eventstores.getDatabase("click_stream", env = "production")
// 4. Configure defaults
synutils.eventstores.configure(defaultName = "click_stream", defaultEnv = "development")
println(synutils.eventstores.path)
println(synutils.eventstores.database)
println(synutils.eventstores.name)
// 5. Switch default environment
synutils.eventstores.configure(defaultEnv = "production")
println(synutils.eventstores.path)
// 6. Drop cached responses
synutils.eventstores.clearCache()
notifications — send platform notifications (email)
Method | Purpose |
|---|---|
| Send a notification |
recipients is a single email or comma-separated list (e.g. "a@x.com, b@y.com"). Set useDefaultHtmlTemplate=False to send the message body verbatim (skip the server's default HTML wrapper).
Python
# 1. Plain file path — simplest form
synutils.notifications.send(
recipients="a@x.com, b@y.com",
subject="Daily report",
message="See attached",
attachments=["/tmp/report.pdf"],
)
# 2. In-memory bytes via (filename, bytes) tuple
csv_bytes = b"id,name
1,alice
"
synutils.notifications.send(
recipients="a@x.com",
subject="Daily export",
message="Attached CSV",
attachments=[("daily.csv", csv_bytes)],
)
# 3. In-memory bytes with explicit content type — (filename, bytes, mimetype)
import json
data = {"ok": True}
synutils.notifications.send(
recipients="a@x.com",
subject="Run status",
message="See attached JSON",
attachments=[("run_status.json", json.dumps(data).encode(), "application/json")],
)
# 4. Mix of file path and in-memory content
synutils.notifications.send(
recipients="a@x.com",
subject="Job done",
message="Logs + summary",
attachments=[
"/var/log/job.log", # path
("summary.csv", b"job,seconds
foo,42
"), # bytes
],
)
# 5. File-like object (e.g. io.BytesIO) also works as the second tuple element
import io
buf = io.BytesIO()
buf.write(b"col_a,col_b
1,2
") s
ynutils.notifications.send(
recipients="a@x.com",
subject="Buffered results",
message="Attached",
attachments=[("results.csv", buf.getvalue())],
)Scala
// 1. Plain file path — simplest form (matches Python)
synutils.notifications.send(
recipients = "a@x.com, b@y.com",
subject = "Daily report",
message = "See attached",
attachments = Seq("/tmp/report.pdf")
)
// 2. In-memory bytes via tuple
val csvBytes = "id,name
1,alice
".getBytes("UTF-8")
synutils.notifications.send(
recipients = "a@x.com",
subject = "Daily export",
message = "Attached CSV",
attachments = Seq(("daily.csv", csvBytes, "text/csv"))
)
// 3. Mix of file path and in-memory content
synutils.notifications.send(
recipients = "a@x.com",
subject = "Job done",
message = "Logs + summary",
attachments = Seq(
"/var/log/job.log",
("summary.txt", "Job completed in 42s".getBytes("UTF-8"), "text/plain")
)
)
// 4. Pre-built Attachment objects also still work
synutils.notifications.send(
recipients = "a@x.com",
subject = "x", message = "y",
attachments = Seq(Attachment.fromFile("/tmp/file.csv"))
)
credentials — secret store lookup
Method | Purpose |
|---|---|
| Fetch a single secret value — returns |
| All secrets under a name — returns |
| Credential description |
| Raw metadata |
| All credentials |
Basic usage
# Python
pwd = synutils.credentials.get("svc-account-a", "password")
all_secrets = synutils.credentials.getAll("svc-account-a")
desc = synutils.credentials.describe("svc-account-a")
print(synutils.credentials.list())// Scala
val pwd = synutils.credentials.get("svc-account-a", "password")
val all = synutils.credentials.getAll("svc-account-a")
val desc = synutils.credentials.describe("svc-account-a")
println(synutils.credentials.list())Redaction — values print as ********** by default
get() returns a SecretString; getAll() returns a SecretDict / SecretMap. Both override their string representation so the raw value never leaks into notebook output, logs, or exceptions.
Python
secret = synutils.credentials.get("svc-account-a", "password")
# All of these print **********
print(secret) # **********
repr(secret) # **********
f"connecting with pw={secret}" # connecting with pw=**********
"%s" % secret # **********
secret # ********** (Jupyter cell last line)
# Logging is safe too
logger.info("got %s", secret) # got **********
# Exception messages don't leak
raise ValueError(secret) # ValueError: **********
# getAll() — every value is redacted
all_secrets = synutils.credentials.getAll("svc-account-a")
print(all_secrets)
# {'user': '**********', 'password': '**********', 'host': '**********'}Scala
val secret = synutils.credentials.get("svc-account-a", "password")
// All of these print **********
println(secret) // **********
secret.toString // **********
println(s"pw=$secret") // pw=**********
secret // ********** (REPL last expression)
// getAll() — every value is redacted
val all = synutils.credentials.getAll("svc-account-a")
println(all)
// Map(user -> **********, password -> **********, host -> **********)Unseal — getting the raw value for SDK calls
Use .get() (preferred) or .unseal() (legacy alias) on a single SecretString. For a whole SecretDict/SecretMap, use .unseal() to get a plain dict / Map[String, String].
Python — pass to SDKs
import boto3
# Single value — call .get() or .unseal()
secret_key = synutils.credentials.get("aws_creds", "secret_access_key")
client = boto3.client("s3", aws_secret_access_key=secret_key.get())
# Whole bag — .unseal() returns a regular dict with raw strings
raw = synutils.credentials.getAll("aws_creds").unseal()
client = boto3.client(
"s3",
aws_access_key_id=raw["access_key"],
aws_secret_access_key=raw["secret_key"],
)Scala — implicit conversion to String for JDBC / SDKs
import java.sql.DriverManager
val pwd = synutils.credentials.get("db_creds", "password")
// JDBC — implicit conversion to String works directly
val conn = DriverManager.getConnection(jdbcUrl, "user", pwd)
// String interpolation — use type ascription to force the conversion
val urlWithPwd = s"jdbc:postgresql://host/${pwd: String}"
// Or explicit unseal / get
val raw: String = pwd.get() // preferred
val raw2: String = pwd.unseal() // legacy alias
// Whole bag — .unseal() / .getAll() returns Map[String, String]
val rawMap: Map[String, String] = synutils.credentials.getAll("db_creds").unseal()Heads-up — string manipulation reveals the value.
SecretStringis a string subclass (Python) / has an implicitStringconversion (Scala), so anything that touches the underlying characters bypasses redaction:# Python — these all leak the raw value "prefix-" + secret # leaks secret + "-suffix" # leaks secret[:5] # leaks secret.encode() # leaks secret.upper() # leaks// Scala — these force the implicit conversion and leak val s: String = secret // leaks (type ascription) "prefix-" + secret // leaks (implicit conversion)Rule of thumb: printing/logging is safe; string manipulation reveals the value. Prefer f-strings / s-interpolation over
+concatenation when building log lines.
files — platform file registry
A file object in the platform pairs a base cloud-storage path with a list of files (the object's parameters). synutils.files resolves these to full cloud paths and — for DATA_FILE objects in supported formats — reads them directly into a Spark DataFrame.
Methods
Method | Purpose |
|---|---|
| Full file object dict / Map from the API (cached per name) |
| Full cloud paths for all files in this object — returns |
| Curated subset of metadata with renamed keys |
| Read a registered file into a Spark DataFrame. |
| Drop cached responses (force a re-fetch on next call) |
createDataFrame parameters
Parameter | Type | Default | Purpose |
|---|---|---|---|
|
| — | File object name registered in the platform |
|
| — | Specific file within the object — must match one of the entries in the object's |
|
|
| Column separator. Used only for |
|
|
| First row is a header. Used only for |
|
|
| Infer column types from data. Used only for |
Raises:
RuntimeError(Py) /SynUtilsException(Scala) if noSparkSessionwas supplied toinit().ValueError(Py) /IllegalArgumentException(Scala) if the object is not aDATA_FILEor itsfileFormatis unsupported.
Examples
Python
# 1. Inspect a file object
info = synutils.files.get("daily_report")
print(info["objectTypeKey"], info["fileFormat"])
# 2. Get full cloud paths for every file in the object
paths = synutils.files.getPath("daily_report")
# ['gs://my-bucket/reports/sales.csv', 'gs://my-bucket/reports/orders.csv']
# 3. Curated metadata subset
meta = synutils.files.getMetadata("daily_report")
# 4. Read one file as a DataFrame (uses the object's configured delimiter)
df = synutils.files.createDataFrame("daily_report", "sales.csv")
df.show(5)
# 5. Override CSV options for this read only
df = synutils.files.createDataFrame(
"daily_report",
"sales.tsv",
sep="\t",
header=False,
inferSchema=False, )
# 6. JSON / PARQUET / ORC / AVRO — sep / header / inferSchema are ignored
events = synutils.files.createDataFrame("event_dump", "events.json")
sales = synutils.files.createDataFrame("sales_dump", "2024-01.parquet")
orders = synutils.files.createDataFrame("orders_dump", "orders.orc")
records = synutils.files.createDataFrame("user_records", "users.avro")
# Note: AVRO requires the matching spark-avro JAR loaded into the runtime.
# 7. Drop cached responses (force a re-fetch on next call)
synutils.files.clearCache()Scala
// 1. Inspect a file object val info = synutils.files.get("daily_report")
println(s"${info("objectTypeKey")} ${info("fileFormat")}")
// 2. Full cloud paths for every file in the object
val paths: List[String] = synutils.files.getPath("daily_report")
// 3. Curated metadata subset
val meta = synutils.files.getMetadata("daily_report")
println(meta("type"), meta("fileFormat"))
// 4. Read one file as a DataFrame (uses the object's configured delimiter)
val df = synutils.files.createDataFrame("daily_report", "sales.csv")
df.show(5)
// 5. Override CSV options for this read only
val tsv = synutils.files.createDataFrame(
"daily_report",
"sales.tsv",
sep = "\t",
header = false,
inferSchema = false )
// 6. JSON / PARQUET / ORC / AVRO — sep / header / inferSchema are ignored
val events = synutils.files.createDataFrame("event_dump", "events.json")
val sales = synutils.files.createDataFrame("sales_dump", "2024-01.parquet")
val orders = synutils.files.createDataFrame("orders_dump", "orders.orc")
val records = synutils.files.createDataFrame("user_records", "users.avro")
// 7. Drop cached responses
synutils.files.clearCache()Tip:
getPath()returns paths for all files in the object — useful when you want Spark to read everything in one go viaspark.read.csv(synutils.files.getPath("daily_report")).createDataFrame()reads exactly one file at a time, identified byfileName.
fs — direct object-store filesystem (S3 / GCS / Azure / HDFS)
The synutils.fs object auto-routes by URI scheme.
Methods
Method | Purpose |
|---|---|
| List entries (non-recursive) |
| Recursive list |
| True if file or folder-prefix exists |
| Upload single file |
| Download single file |
| Recursive upload |
| Recursive download |
| Server-side copy |
| Move / rename |
| Rename in place (same bucket/container) |
| Delete file or prefix |
| Create directory marker |
| Read text content |
| Read first N bytes as text |
| Write a text file |
| Lazy read stream for large files |
| Upload from a file-like object |
Legacy aliases (Python only)
These names still work — they delegate to their canonical counterpart. Use the canonical name in new code.
Canonical | Legacy name |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Examples
Python
synutils.fs.upload("local.csv", "gs://my-bucket/remote.csv")
print(synutils.fs.exists("gs://my-bucket/remote.csv"))
print(synutils.fs.ls("gs://my-bucket/"))Scala
synutils.fs.upload("local.csv", "gs://my-bucket/remote.csv")
println(synutils.fs.exists("gs://my-bucket/remote.csv"))
println(synutils.fs.ls("gs://my-bucket/"))
spark — dataset → DataFrame + write helpers (requires Spark)
Method | Purpose |
|---|---|
| Read dataset into a DataFrame |
| True if Hive table exists |
| Create Hive table from DataFrame |
| Write a DataFrame to an event store. Auto-creates the dataset if it doesn't exist. |
| Convenience wrapper around |
| Push a local file into the event store |
Python
df = synutils.spark.createDataFrame("user_events")
df.show(5)
synutils.spark.writeDatasetToEventStore(df, "user_events")Scala
val df = synutils.spark.createDataFrame("user_events")
df.show(5)
synutils.spark.writeDatasetToEventStore(df, "user_events")writeToEventStore — full signature
Writes a DataFrame to a Hive-managed event store table. If the dataset doesn't already exist in the platform's metadata, it is created on the fly using the supplied fileFormat. Partitioning, compression, and process-mode handling are derived from the dataset definition; this method only orchestrates the Spark-side write.
Parameter | Type | Default | Purpose |
|---|---|---|---|
| DataFrame | — | Source DataFrame to write |
| String | — | Dataset name in |
| Int |
| If > 0 and the dataset is partitioned, adds |
| String |
| Override the dataset's configured partition column. If set, the dataset is updated to partition by this column before writing |
| Boolean |
|
|
| Boolean |
| When |
| FileFormat / String |
| Used only when the dataset must be created (404 from the dataset API). Ignored if the dataset already exists. Supported: |
Python
from synutils.file_format import FileFormat # Minimal write — dataset must already exist synutils.spark.writeToEventStore(df, "analytics.user_events") # Append (don't overwrite existing partitions) synutils.spark.writeToEventStore(df, "analytics.user_events", isOverwrite=False) # Control output file count per partition (e.g. 8 files per partition) synutils.spark.writeToEventStore(df, "analytics.user_events", numPartitions=8) # Override the partition column for this write only synutils.spark.writeToEventStore(df, "analytics.user_events", partitionedDateColumn="event_date") # Auto-create as Avro if dataset doesn't exist yet synutils.spark.writeToEventStore(df, "analytics.new_avro_dataset", fileFormat=FileFormat.AVRO) # Preserve existing table definition (don't recreate) synutils.spark.writeToEventStore(df, "analytics.user_events", overrideProcessMode=False)
Scala
import com.syntasa.synutils.FileFormat // Minimal write synutils.spark.writeToEventStore(df, "analytics.user_events") // Append synutils.spark.writeToEventStore(df, "analytics.user_events", isOverwrite = false) // Control output file count per partition synutils.spark.writeToEventStore(df, "analytics.user_events", numPartitions = 8) // Override partition column for this write synutils.spark.writeToEventStore(df, "analytics.user_events", partitionedDateColumn = "event_date") // Auto-create as Avro synutils.spark.writeToEventStore(df, "analytics.new_avro_dataset", fileFormat = FileFormat.AVRO.getValue()) // Preserve existing table definition synutils.spark.writeToEventStore(df, "analytics.user_events", overrideProcessMode = false)
Heads-up:
isOverwrite=Trueoverwrites at the partition level for partitioned tables, not the entire table. For non-partitioned tables it overwrites the whole table.
lib — install packages and JARs
Python — installPyPI / installCondaPackage
Method | Purpose |
|---|---|
| pip install. Auto-downloads |
| conda install (channel |
Set acrossAllNodes=False to install on the kernel only (skip distribution to Spark worker nodes).
# 1. Regular PyPI package synutils.lib.installPyPI("requests")
# 2. Multiple packages synutils.lib.installPyPI("requests pandas>=2.0 numpy")
# 3. Local zip from the cluster's deps/python folder
# (resolves to <bucket>/<config-folder>/deps/python/simple_module.zip)
synutils.lib.installPyPI("simple_module.zip")
# 4. Full cloud path (s3://, gs://, or https://)
synutils.lib.installPyPI("s3://syntasa-k3s-dev/pradeepm/python_modules/simple_module.zip")
# 5. Mix everything in one call
synutils.lib.installPyPI("requests simple_module.zip s3://syntasa-k3s-dev/pradeepm/python_modules/other.whl")
# 6. Kernel-only install (don't distribute to Spark workers)
synutils.lib.installPyPI("requests", acrossAllNodes=False)
# 7. conda-forge package
synutils.lib.installCondaPackage("scipy")Scala — installJars
Install JARs into the running Scala kernel.
Accepts three source styles, mixed freely in a single space-delimited call:
Source | Example |
|---|---|
Maven coordinates |
|
Filename in the cluster's deps folder ( |
|
Full cloud-storage path |
|
// Single source synutils.lib.installJars("org.joda:joda-money:1.0.4")
synutils.lib.installJars("greeterWithDollar.jar")
synutils.lib.installJars("gs://syn-400-development-kub/pradeepm/greeter.jar")
// Multiple, space-delimited
synutils.lib.installJars("org.joda:joda-money:1.0.4 greeterWithDollar.jar gs://syn-400-development-kub/pradeepm/greeter.jar")