Syntasa Notebook Utilities - User Reference – SYNTASA™

synutils is the unified entry point in every notebook — a Python attribute and a Scala REPL alias. Each section below lists user-callable methods with examples in both languages.

`connections` — connection metadata + decrypted parameters

Method	Purpose
`get(name)`	Full connection object (parameters auto-decrypted; encrypted fields wrapped in `SecretString`)
`getParam(name, param)`	Single parameter value as `str` / `String` (see redaction note below)
`getParamSealed(name, param)` (Scala only)	Single parameter value as `SecretString`. Always wrapped — `host` / `port` will redact too. Use for `.get()` / `.unseal()` ergonomics.
`getAllParams(name)`	All parameters — preserves `SecretString` wrapping for encrypted fields
`clearCache()`	Drop cached responses

Python

conn = synutils.connections.get("my_snowflake")
host = synutils.connections.getParam("my_snowflake", "host") 

# Drop cached responses (force a re-fetch on next call) 
synutils.connections.clearCache()

Scala

val conn = synutils.connections.get("my_snowflake") 
val host = synutils.connections.getParam("my_snowflake", "host") 

// Drop cached responses (force a re-fetch on next call) 
synutils.connections.clearCache()

Redaction — encrypted fields render as `**********`

After decryption, encrypted parameter fields (passwords, API keys, secret keys, private keys — whichever fields apply per connection type) are wrapped in SecretString. They redact in print/log output but still work for SDK calls. Non-encrypted parameters (host, port, database, etc.) are returned as plain strings.

Python

SecretString is a str subclass; it redacts when printed and reveals when you call .get() / .unseal() (or, if a SDK accepts a str subclass directly, you can pass it as-is).

conn = synutils.connections.get("my_snowflake") 
print(conn["parameters"]) 
# {'host': 'snow.example.com', 
# 'database': 'analytics', 
# 'username': 'svc_account', 
# 'password': '**********', 
# 'privateKey': '**********'} 

# Plain field — no wrapping 
print(conn["parameters"]["host"]) # snow.example.com 

# Encrypted field — redacts in print, reveals via .get() 
pw = conn["parameters"]["password"] 
print(pw) # ********** 
print(f"pw={pw}")                 # pw=********** 
print(pw.get())                   # actual password 

# Pass to a SDK (works because SecretString is a str subclass — but 
# many SDKs call str() internally, which returns "**********". For those, 
# call .get() explicitly to be safe.) 
import snowflake.connector 
conn_handle = snowflake.connector.connect( 
    user=conn["parameters"]["username"], 
    password=conn["parameters"]["password"].get(),         # <- unseal for SDK 
    account=conn["parameters"]["account"], 
)

Scala

SecretString has an implicit conversion to String, so JDBC and most SDKs work directly. Use .get() / .unseal() for explicit reveal.

val conn = synutils.connections.get("my_snowflake") 
val params = conn("parameters").asInstanceOf[Map[String, Any]] 

println(params) 
// Map(host -> snow.example.com, database -> analytics, 
// username -> svc_account, password -> **********, privateKey -> **********) 

// Encrypted field — pattern-match for type-safe access 
val pwd = params("password").asInstanceOf[SecretString] 
println(pwd)                       // ********** 
println(pwd.get())                 // actual password 

// JDBC — implicit conversion fires automatically (no explicit unseal needed) 
import java.sql.DriverManager 
DriverManager.getConnection(jdbcUrl, "svc_account", pwd) 

// String interpolation — type ascription forces the conversion 
val safeUrl = s"jdbc:postgresql://host/${pwd: String}"

`datasets` — dataset registry + Hive table metadata

Method	Purpose
`get(datasetName)`	DataSet object (table name, partition cols, etc.)
`create(datasetName, fileFormat=PARQUET)`	Register a new dataset
`list(eventStoreName)`	All datasets under an event store (`devDatasets` + `prodDatasets` combined). Each entry has an `environment` field — filter on that if you only want one env.

DataSet object methods: tableName(), getPartitionColumns(), getNonPartitionColumns(), isPartitioned().

Python

# Fetch 
ds = synutils.datasets.get("user_events") 
print(ds.tableName(), ds.isPartitioned()) 

# Register a new dataset (default file format is PARQUET) 
from synutils.file_format import FileFormat 
synutils.datasets.create("my_new_dataset") 
synutils.datasets.create("my_avro_dataset", fileFormat=FileFormat.AVRO) 

# List all datasets registered under an event store 
# (devDatasets + prodDatasets combined; filter by 'environment' for one env) 
for d in synutils.datasets.list("TestStore"): 
    print(d["name"], d["environment"], d["database"])
  
# Only PRODUCTION datasets 
prod = [d for d in synutils.datasets.list("TestStore") if d["environment"] == "PRODUCTION"]

Scala

// Fetch 
val ds = synutils.datasets.get("user_events") 
println(s"${ds.tableName} ${ds.isPartitioned}") 

// Register a new dataset (default file format is "parquet") 
synutils.datasets.create("my_new_dataset") 
synutils.datasets.create("my_avro_dataset", fileFormat = "avro") 

// List all datasets registered under an event store (dev + prod combined) 
synutils.datasets.list("TestStore").foreach { d => 
  println(s"${d("name")} ${d("environment")} ${d("database")}") 
} 

// Only PRODUCTION datasets 
val prod = synutils.datasets.list("TestStore") 
  .filter(_("environment") == "PRODUCTION")

`infrastructure` — platform infra metadata

The full method surface, side by side. Use the canonical method in new code. Legacy aliases exist only for backward compatibility with pre-integration notebooks.

Properties (read-only attributes)

Returns	Python	Scala	Legacy (Python)
Cloud provider — `"AWS"` / `"GOOGLE"` / `"AZURE"`	`providerType`	`providerType`	`get_provider_type_from_metadata()`
Cloud region	`region`	`region`	—
GCP project ID (empty for AWS/Azure)	`projectId`	`projectId`	—
Default storage bucket	`bucket`	`bucket`	—
Filesystem prefix (`s3://`, `gs://`, `abfs://`)	`fileSystemPrefix`	`fileSystemPrefix`	—
Full storage path (`prefix + bucket`)	`storagePath`	`storagePath`	—
SSH connection type	`sshType`	`sshType`	—
Metastore type (`AWS_GLUE` / `HIVE` / `BIGQUERY`)	`metastoreType`	`metastoreType`	—
Metastore hostname	`metastoreHostname`	`metastoreHostname`	—

Section getters

Returns	Python	Scala	Legacy (Python)
`config` section — region, projectId, etc.	`getConfig()`	`getConfig()`	`get_config_from_metadata()`
`storage` section — bucket, paths	`getStorage()`	`getStorage()`	`get_storage_from_metadata()`
`network` section	`getNetwork()`	`getNetwork()`	`get_network_from_metadata()`
`metastore` section	`getMetastore()`	`getMetastore()`	`get_metastore_from_metadata()`
`security` section — cloud creds, SSH	`getSecurity()`	`getSecurity()`	`get_security_from_metadata()`
Full payload (all sections)	`getAll()`	`getAll()`	`get_all_from_metadata()`, `asDict()`, `get_infrasturcture_json(token=None)`
Global init script (`Optional[str]` / `Option[String]`)	`getGlobalInitScript()`	`getGlobalInitScript()`	`get_global_init_script_from_metadata()`
Drop the cached payload	`clearCache()`	`clearCache()`	—

Generic accessors (backward-compat with legacy `InfrastructureClient`)

Returns	Python	Scala	Legacy (Python)
Any `infraProvider` section by name	`get(resource)`	(use `getAll()("infraProvider")(resource)`)	`get(token, resource)`
Same — alias	`get_from_metadata(resource)`	—	`get_from_metadata(resource)`
Full payload (legacy typo preserved)	`get_infrasturcture_json(token=None)`	—	`get_infrasturcture_json(token)`

The token parameter is accepted on the legacy aliases but ignored — auth is handled by the underlying ApiClient at construction. Sensitive values under security / metastore come back as SecretString (redact on print, work as plain str with SDKs).

# Generic — fetch any section dynamically
synutils.infrastructure.get("storage")            # plain dict
synutils.infrastructure.get("notebookConfig")     # {"script": "..."}
synutils.infrastructure.get("security")           # values redact when printed
# Legacy aliases
synutils.infrastructure.get_from_metadata("storage")
synutils.infrastructure.get_infrasturcture_json()  # full nested payload

Python returns dict everywhere; Scala returns Map[String, AnyRef]. Same shape, language-native types.

Secret fields (encrypted + sealed)

Encrypted fields are auto-decrypted by the platform and wrapped in SecretString so they redact in print / log output. Pull a value out of the dict and call .get() / .unseal() to reveal the decrypted plaintext.

Section	Secret field names
`metastore`	`metastorePassword`
`security`	`keyFile`, `accessKey`, `secretKey`, `sshUserPass`
`bigQuery`	`keyFile`
`bigTable`	`keyFile`
`redshift`	`password`
`kafka`	`sslCertificate`, `sslKey`
`gitConfig`	`personalAccessToken`, `oauthToken`, `sshKey`

If decryption fails (no key, malformed ciphertext) the original ciphertext is preserved and a warning is logged — print(meta["metastorePassword"]) still redacts to **********, but .get() returns the raw ciphertext rather than the plaintext.

The seven subsections above are not exposed as individual section getters; access them via getAll() / asDict() / get_infrasturcture_json() or via the existing getMetastore() / getSecurity() for those two.

Python

print(synutils.infrastructure.providerType, synutils.infrastructure.bucket) 
print(synutils.infrastructure.getStorage()) # full storage dict 

sec = synutils.infrastructure.getSecurity() 
print(sec) # {'accessKey': '**********', 'secretKey': '**********', ...} 
sec["secretKey"].get() # decrypted plaintext 

meta = synutils.infrastructure.getMetastore() 
meta["metastorePassword"].get() # decrypted plaintext 

payload = synutils.infrastructure.getAll() 
payload["infraProvider"]["security"]["secretKey"].get()             # decrypted 
payload["infraProvider"]["metastore"]["metastorePassword"].get()    # decrypted 
payload["infraProvider"]["bigQuery"]["keyFile"].get()               # decrypted JSON 
payload["infraProvider"]["redshift"]["password"].get()              # decrypted 
payload["infraProvider"]["kafka"]["sslKey"].get()                   # decrypted
payload["infraProvider"]["gitConfig"]["personalAccessToken"].get()  # decrypted

Scala

SecretString is available as a bare name in notebook Scala sessions.

println(s"${synutils.infrastructure.providerType} ${synutils.infrastructure.bucket}") 
println(synutils.infrastructure.getStorage())               // full storage Map 

val sec = synutils.infrastructure.getSecurity() 
println(sec)                                               // values render as ********** 

// Map[String, AnyRef] preserves the seal — values aren't auto-unsealed. 
// Cast to SecretString to access .get() / .unseal(). 
val secretKey = sec("secretKey").asInstanceOf[SecretString] 
println(secretKey)                                         // ********** 
println(secretKey.get())                                   // decrypted plaintext 

val pwd = synutils.infrastructure.getMetastore()("metastorePassword") 
                    .asInstanceOf[SecretString] 
val raw: String = pwd                                      // implicit conversion reveals (decrypted)

`eventstores` — event store path / database resolver

An event store is the primary data storage unit on the platform — it pairs a cloud storage path with a metastore database, with separate values per environment (development / production / snapshot).

Method	Purpose
`get(value, lookupType="name", env="development")`	Full event store object with resolved `path` / `database` for the requested env
`getPath(name, lookupType="name", env="development")`	Storage path only
`getDatabase(name, lookupType="name", env="development")`	Hive database name only
`configure(defaultName=…, defaultEnv=…)`	Set a default event store + environment so the `path` / `database` / `name` properties resolve without arguments
`path`, `database`, `name`	Properties that return the path / database / name of the default event store (requires `configure(...)`)
`clearCache()`	Drop the response cache

lookupType selects how value is interpreted:

"name" — value is the event store's logical name; env decides which environment's path/database to return.
"database" — value is a database name; the environment is auto-detected by matching against developmentDatabase / productionDatabase in the response (the env argument is ignored).

Examples

Python

# 1. Lookup by name + environment 
es = synutils.eventstores.get("click_stream", env="production") 
print(es["path"], es["database"], es["_resolved_env"]) 
# gs://.../click_stream click_stream_prod production 

# 2. Lookup by database — env auto-detected from the database value 
es = synutils.eventstores.get("click_stream_dev", lookupType="database") 
print(es["_resolved_env"]) # development 

# 3. Just need the path or database 
synutils.eventstores.getPath("click_stream", env="development") 
synutils.eventstores.getDatabase("click_stream", env="production") 

# 4. Configure defaults — then use path / database / name as bare properties 
synutils.eventstores.configure(defaultName="click_stream", defaultEnv="development") 
print(synutils.eventstores.path) 
print(synutils.eventstores.database) 
print(synutils.eventstores.name) 

# 5. Switch the default environment without changing the name 
synutils.eventstores.configure(defaultEnv="production") 
print(synutils.eventstores.path) 

# 6. Drop cached responses (force a re-fetch on next call) 
synutils.eventstores.clearCache()

Scala

// 1. Lookup by name + environment 
val es = synutils.eventstores.get("click_stream", env = "production") 
println(s"${es("path")} ${es("database")} ${es("_resolved_env")}") 

// 2. Lookup by database — env auto-detected 
val byDb = synutils.eventstores.get("click_stream_dev", lookupType = "database") 
println(byDb("_resolved_env")) // development 

// 3. Just need the path or database 
synutils.eventstores.getPath("click_stream", env = "development")
synutils.eventstores.getDatabase("click_stream", env = "production") 

// 4. Configure defaults 
synutils.eventstores.configure(defaultName = "click_stream", defaultEnv = "development") 
println(synutils.eventstores.path) 
println(synutils.eventstores.database) 
println(synutils.eventstores.name) 

// 5. Switch default environment 
synutils.eventstores.configure(defaultEnv = "production") 
println(synutils.eventstores.path) 

// 6. Drop cached responses 
synutils.eventstores.clearCache()

`notifications` — send platform notifications (email)

Method	Purpose
`send(recipients, subject, message, type="EMAIL", attachments=…, useDefaultHtmlTemplate=True)`	Send a notification

recipients is a single email or comma-separated list (e.g. "a@x.com, b@y.com"). Set useDefaultHtmlTemplate=False to send the message body verbatim (skip the server's default HTML wrapper).

Python

# 1. Plain file path — simplest form 
synutils.notifications.send( 
    recipients="a@x.com, b@y.com", 
    subject="Daily report", 
    message="See attached", 
    attachments=["/tmp/report.pdf"], 
) 


# 2. In-memory bytes via (filename, bytes) tuple 
csv_bytes = b"id,name 
1,alice 
" 
synutils.notifications.send( 
      recipients="a@x.com", 
      subject="Daily export", 
      message="Attached CSV", 
      attachments=[("daily.csv", csv_bytes)], 
) 


# 3. In-memory bytes with explicit content type — (filename, bytes, mimetype) 
import json 
data = {"ok": True} 
synutils.notifications.send( 
    recipients="a@x.com", 
    subject="Run status", 
    message="See attached JSON", 
    attachments=[("run_status.json", json.dumps(data).encode(), "application/json")], 
) 

# 4. Mix of file path and in-memory content 
synutils.notifications.send( 
    recipients="a@x.com", 
    subject="Job done", 
    message="Logs + summary", 
    attachments=[ 
          "/var/log/job.log",             # path 
          ("summary.csv", b"job,seconds 
foo,42 
"),                                       # bytes 
    ], 
) 

# 5. File-like object (e.g. io.BytesIO) also works as the second tuple element 
import io 
buf = io.BytesIO() 
buf.write(b"col_a,col_b 
1,2 
") s
ynutils.notifications.send( 
    recipients="a@x.com", 
    subject="Buffered results", 
    message="Attached", 
    attachments=[("results.csv", buf.getvalue())], 
)

Scala

// 1. Plain file path — simplest form (matches Python) 
synutils.notifications.send( 
  recipients = "a@x.com, b@y.com", 
  subject = "Daily report", 
  message = "See attached", 
  attachments = Seq("/tmp/report.pdf") 
) 

// 2. In-memory bytes via tuple 
val csvBytes = "id,name 
1,alice 
".getBytes("UTF-8") 
synutils.notifications.send( 
  recipients = "a@x.com", 
  subject = "Daily export", 
  message = "Attached CSV", 
  attachments = Seq(("daily.csv", csvBytes, "text/csv"))
)   
  
// 3. Mix of file path and in-memory content 
synutils.notifications.send( 
  recipients = "a@x.com", 
  subject = "Job done", 
   message = "Logs + summary", 
   attachments = Seq( 
          "/var/log/job.log",
          ("summary.txt", "Job completed in 42s".getBytes("UTF-8"), "text/plain")
          ) 
   ) 
   
 // 4. Pre-built Attachment objects also still work 
 synutils.notifications.send( 
   recipients = "a@x.com", 
   subject = "x", message = "y", 
   attachments = Seq(Attachment.fromFile("/tmp/file.csv")) 
 )

`credentials` — secret store lookup

Method	Purpose	Legacy
`get(name, key)`	Fetch a single secret value — returns `SecretString`	`read(name, key)`
`getAll(name)`	All secrets under a name — returns `SecretDict` (Python) / `SecretMap` (Scala)	`read(name`)
`describe(name)`	Credential description
`getMetadata(name)`	Raw metadata
`list()`	All credentials

Basic usage

# Python

pwd = synutils.credentials.get("svc-account-a", "password") 
all_secrets = synutils.credentials.getAll("svc-account-a") 
desc = synutils.credentials.describe("svc-account-a") 
print(synutils.credentials.list())

# Backward-compat alias — read() resolves to get()/getAll()
pwd = synutils.credentials.read("svc-account-a", key="password")
all_secrets = synutils.credentials.read("svc-account-a")

// Scala 

val pwd = synutils.credentials.get("svc-account-a", "password") 
val all = synutils.credentials.getAll("svc-account-a") 
val desc = synutils.credentials.describe("svc-account-a") 
println(synutils.credentials.list())

// Backward-compat alias — read() resolves to get()/getAll()
val pwd2 = synutils.credentials.read("svc-account-a", "password")
val all2 = synutils.credentials.read("svc-account-a")

Redaction — values print as `**********` by default

get() returns a SecretString; getAll() returns a SecretDict / SecretMap. Both override their string representation so the raw value never leaks into notebook output, logs, or exceptions.

Python

secret = synutils.credentials.get("svc-account-a", "password") 

# All of these print ********** 
print(secret)                  # ********** 
 repr(secret)                  # ********** 
f"connecting with pw={secret}" # connecting with pw=********** 
"%s" % secret                  # ********** 
secret                         # ********** (Jupyter cell last line) 

# Logging is safe too 
logger.info("got %s", secret)  # got ********** 

# Exception messages don't leak 
raise ValueError(secret)       # ValueError: ********** 

# getAll() — every value is redacted 
all_secrets = synutils.credentials.getAll("svc-account-a") 
print(all_secrets) 
# {'user': '**********', 'password': '**********', 'host': '**********'}

Scala

val secret = synutils.credentials.get("svc-account-a", "password") 

// All of these print ********** 
println(secret)         // ********** 
secret.toString         // ********** 
println(s"pw=$secret")  // pw=********** 
secret                  // ********** (REPL last expression) 

// getAll() — every value is redacted 
val all = synutils.credentials.getAll("svc-account-a") 
println(all) 

// Map(user -> **********, password -> **********, host -> **********)

Unseal — getting the raw value for SDK calls

Use .get() (preferred) or .unseal() (legacy alias) on a single SecretString. For a whole SecretDict/SecretMap, use .unseal() to get a plain dict / Map[String, String].

Python — pass to SDKs

import boto3 

# Single value — call .get() or .unseal() 
secret_key = synutils.credentials.get("aws_creds", "secret_access_key") 
client = boto3.client("s3", aws_secret_access_key=secret_key.get()) 

# Whole bag — .unseal() returns a regular dict with raw strings 
raw = synutils.credentials.getAll("aws_creds").unseal() 
client = boto3.client( 
      "s3", 
       aws_access_key_id=raw["access_key"], 
       aws_secret_access_key=raw["secret_key"], 
)

Scala — implicit conversion to String for JDBC / SDKs

import java.sql.DriverManager 

val pwd = synutils.credentials.get("db_creds", "password") 

// JDBC — implicit conversion to String works directly 
val conn = DriverManager.getConnection(jdbcUrl, "user", pwd) 

// String interpolation — use type ascription to force the conversion 
val urlWithPwd = s"jdbc:postgresql://host/${pwd: String}" 

// Or explicit unseal / get 
val raw: String = pwd.get() // preferred 
val raw2: String = pwd.unseal() // legacy alias 

// Whole bag — .unseal() / .getAll() returns Map[String, String] 
val rawMap: Map[String, String] = synutils.credentials.getAll("db_creds").unseal()

Heads-up — string manipulation reveals the value. SecretString is a string subclass (Python) / has an implicit String conversion (Scala), so anything that touches the underlying characters bypasses redaction:
# Python — these all leak the raw value 
"prefix-" + secret # leaks 
secret + "-suffix" # leaks 
secret[:5] # leaks 
secret.encode() # leaks 
secret.upper() # leaks
// Scala — these force the implicit conversion and leak 
val s: String = secret // leaks (type ascription) 
"prefix-" + secret // leaks (implicit conversion)
Rule of thumb: printing/logging is safe; string manipulation reveals the value. Prefer f-strings / s-interpolation over + concatenation when building log lines.

`files` — platform file registry

A file object in the platform pairs a base cloud-storage path with a list of files (the object's parameters). synutils.files resolves these to full cloud paths and — for DATA_FILE objects in supported formats — reads them directly into a Spark DataFrame.

Methods

Method	Purpose
`get(name)`	Full file object dict / Map from the API (cached per name)
`getPath(name)`	Full cloud paths for all files in this object — returns `List[str]` / `List[String]`
`getMetadata(name)`	Curated subset of metadata with renamed keys
`createDataFrame(name, fileName, sep=None, header=True, inferSchema=True)`	Read a registered file into a Spark DataFrame. `DATA_FILE` objects only; supported `fileFormat`: `DELIMITED`, `JSON`, `PARQUET`, `ORC`, `AVRO`
`clearCache()`	Drop cached responses (force a re-fetch on next call)

`createDataFrame` parameters

Parameter	Type	Default	Purpose
`name`	`str` / `String`	—	File object name registered in the platform
`fileName`	`str` / `String`	—	Specific file within the object — must match one of the entries in the object's `parameters[].name` list
`sep`	`str` / `String`	`None` (Py) / `null` (Scala) — falls back to the object's API-configured `delimiter`, then `","`	Column separator. Used only for `DELIMITED`.
`header`	`bool` / `Boolean`	`True`	First row is a header. Used only for `DELIMITED`.
`inferSchema`	`bool` / `Boolean`	`True`	Infer column types from data. Used only for `DELIMITED`.

Raises:

RuntimeError (Py) / SynUtilsException (Scala) if no SparkSession was supplied to init().
ValueError (Py) / IllegalArgumentException (Scala) if the object is not a DATA_FILE or its fileFormat is unsupported.

Examples

Python

# 1. Inspect a file object 
info = synutils.files.get("daily_report") 
print(info["objectTypeKey"], info["fileFormat"]) 

# 2. Get full cloud paths for every file in the object 
paths = synutils.files.getPath("daily_report") 
# ['gs://my-bucket/reports/sales.csv', 'gs://my-bucket/reports/orders.csv'] 

# 3. Curated metadata subset 
meta = synutils.files.getMetadata("daily_report") 

# 4. Read one file as a DataFrame (uses the object's configured delimiter) 
df = synutils.files.createDataFrame("daily_report", "sales.csv") 
df.show(5) 
 
 # 5. Override CSV options for this read only 
 df = synutils.files.createDataFrame( 
 "daily_report", 
 "sales.tsv", 
 sep="\t", 
 header=False, 
 inferSchema=False, ) 
 
 # 6. JSON / PARQUET / ORC / AVRO — sep / header / inferSchema are ignored 
 events = synutils.files.createDataFrame("event_dump", "events.json") 
 sales = synutils.files.createDataFrame("sales_dump", "2024-01.parquet") 
 orders = synutils.files.createDataFrame("orders_dump", "orders.orc") 
 records = synutils.files.createDataFrame("user_records", "users.avro") 
 # Note: AVRO requires the matching spark-avro JAR loaded into the runtime. 
 
 # 7. Drop cached responses (force a re-fetch on next call) 
 synutils.files.clearCache()

Scala

// 1. Inspect a file object val info = synutils.files.get("daily_report") 
println(s"${info("objectTypeKey")} ${info("fileFormat")}") 

// 2. Full cloud paths for every file in the object 
val paths: List[String] = synutils.files.getPath("daily_report") 

// 3. Curated metadata subset 
val meta = synutils.files.getMetadata("daily_report") 
println(meta("type"), meta("fileFormat")) 

// 4. Read one file as a DataFrame (uses the object's configured delimiter) 
val df = synutils.files.createDataFrame("daily_report", "sales.csv") 
df.show(5) 

// 5. Override CSV options for this read only 
val tsv = synutils.files.createDataFrame( 
"daily_report",
 "sales.tsv", 
 sep = "\t", 
 header = false, 
 inferSchema = false ) 

// 6. JSON / PARQUET / ORC / AVRO — sep / header / inferSchema are ignored 
val events = synutils.files.createDataFrame("event_dump", "events.json") 
val sales = synutils.files.createDataFrame("sales_dump", "2024-01.parquet") 
val orders = synutils.files.createDataFrame("orders_dump", "orders.orc") 
val records = synutils.files.createDataFrame("user_records", "users.avro") 

// 7. Drop cached responses 
synutils.files.clearCache()

Tip: getPath() returns paths for all files in the object — useful when you want Spark to read everything in one go via spark.read.csv(synutils.files.getPath("daily_report")). createDataFrame() reads exactly one file at a time, identified by fileName.

`fs` — direct object-store filesystem (S3 / GCS / Azure / HDFS)

The synutils.fs object auto-routes by URI scheme. Schemeless paths (bucket/key/path) are also accepted — the scheme is inferred from synutils.infrastructure.fileSystemPrefix.

Listing methods — output shape at a glance

Same shape across Python and Scala:

Method	What it returns
`list(path, recursive=True)`	Files → basenames (last `/` segment); folder markers → empty string `""`. With `recursive=False`, files → basenames; subdirectories → bucket-relative prefix path (e.g. `"syn-cluster-config/deps/jars/"`)
`ls(path)`	Top-level entries as full URIs — files (`s3://bucket/prefix/file.jar`) + subdirectory prefixes (`s3://bucket/prefix/sub/`)
`listRecursive(path)`	All files recursively as full URIs
`list_file_paths(path)` (alias for `listRecursive`)	All files recursively as full URIs

Methods

Method	Purpose
`ls(path)`	Top-level entries as full URIs
`listRecursive(path)`	All files recursively as full URIs
`list(path, recursive=True)`	Basenames (see table above)
`list_file_paths(path)`	Alias for `listRecursive` (full URIs)
`exists(path)`	True if file or folder-prefix exists
`upload(local, remote)`	Upload single file
`download(remote, local)`	Download single file
`uploadFolder(localDir, remoteDir)`	Recursive upload
`downloadFolder(remoteDir, localDir)`	Recursive download
`copy(src, dest)`	Server-side copy
`move(src, dest, exclude_folders=None, rewrite_subfolders=None)` (Python) `move(src, dest)`(Scala)	Move single file (single-object mode) or copy a directory tree with filtering (tree mode — see below)
`rename(old, new)`	Rename in place (same bucket/container)
`delete(path)`	Delete file or prefix
`mkdir(path)`	Create directory marker
`content(path)`	Read text content
`read(path)`	Read raw bytes (Python: `bytes`; Scala: `Array[Byte]`)
`head(path, maxBytes=65536)`	Read first N bytes as text
`writeText(path, content)`	Write a text file
`stream(path)`	Lazy read stream for large files
`uploadStream(fileObj, path)`	Upload from a file-like object. Returns the provider-native upload response (boto3 `put_object` dict, GCS `Blob`, Azure upload result)
`PREFIX` (property)	Filesystem URI prefix (`s3://`, `gs://`, `abfs://`) inferred from infrastructure
`getBaseDir()`	User's base workspace directory: `/<bucket>/syn-workspace/workspaces/<workspace>`
`getLocalTempDir()`	Local temporary directory (`/tmp` or platform equivalent)

`move(src, dest, exclude_folders=None, rewrite_subfolders=None)` — full signature

Two modes, dispatched automatically:

Single-object move — when src does not end with / and no filter parameters are given. Performs copy + delete on the single object (true move semantic).
Tree mode (legacy copy-with-filter) — when src ends with / or either filter parameter is supplied. Lists every object under src, applies the filters, and copies each remaining object to the corresponding path under dest. Mirrors the legacy notebook S3FileSystem.move / GCSFileSystem.move semantics.

Heads-up: Tree mode does not delete the source. The legacy implementation was effectively a filtered tree-copy despite the move name — this implementation preserves that behaviour for backward compatibility. If you need a true tree-move, follow up with delete() on the source prefix.

Parameter	Purpose
`src`	Source URI. Treat as a directory prefix when it ends with `/` or a filter parameter is supplied.
`dest`	Destination URI.
`exclude_folders` (Python Only)	Folder names whose objects should be left in place (not copied) during a tree-mode call. Each entry is normalised to `"<name>/"` and substring-matched against the relative path — so `exclude_folders=["logs"]` excludes anything whose relative path contains `"logs/"`.
`rewrite_subfolders` (Python Only)	`{old: new}` substring replacements applied to the destination path via `str.replace`.

# Single-object move (true move: copy + delete) 
synutils.fs.move("s3://bucket/a/file.csv", "s3://bucket/b/file.csv") 

# Tree copy with exclusions (legacy behaviour — does NOT delete source) 
synutils.fs.move( 
  "s3://bucket/src/", "s3://bucket/dest/", 
  exclude_folders=["logs", "temp"], 
) 

# Tree copy with subfolder rename 
synutils.fs.move( 
  "s3://bucket/src/", "s3://bucket/dest/", 
  rewrite_subfolders={"old_dir": "new_dir"}, 
)

Legacy aliases (Python only)

These names still work — they delegate to their canonical counterpart. Use the canonical name in new code.

Canonical	Legacy name
`upload(local, remote)`	`put(local, remote)`
`delete(path)`	`rm(path)`
`move(src, dest)`	`mv(src, dest)`
`exists(path)`	`exist(path)`
`exists(path)`	`is_exists(path)`
`mkdir(path)`	`create_folder(path)`
`uploadFolder(src, dest)`	`upload_folder(src, dest)`
`downloadFolder(src, dest)`	`download_folder(src, dest)`
`uploadStream(file, path)`	`upload_stream(file, path)`

Examples

Python

synutils.fs.upload("local.csv", "gs://my-bucket/remote.csv")
print(synutils.fs.exists("gs://my-bucket/remote.csv"))

# Listing — three shapes
synutils.fs.list("gs://my-bucket/data/")           # basenames (legacy AWS shape)
synutils.fs.ls("gs://my-bucket/data/")             # full URIs (top level)
synutils.fs.listRecursive("gs://my-bucket/data/")  # full URIs (recursive)

# Read raw bytes
data: bytes = synutils.fs.read("gs://my-bucket/config.bin")

# Inspect environment-derived properties
print(synutils.fs.PREFIX)            # "gs://"
print(synutils.fs.getBaseDir())      # "/my-bucket/syn-workspace/workspaces/my-ws"
print(synutils.fs.getLocalTempDir()) # "/tmp"

# upload_stream — capture the response (e.g. ETag / VersionId)
with open("/tmp/data.csv", "rb") as f:
    resp = synutils.fs.upload_stream(f, "s3://my-bucket/data.csv")
print(resp["ETag"])

Scala

synutils.fs.upload("local.csv", "gs://my-bucket/remote.csv")
println(synutils.fs.exists("gs://my-bucket/remote.csv"))

// Listing — three shapes
synutils.fs.list("gs://my-bucket/data/").foreach(println)            // basenames
synutils.fs.ls("gs://my-bucket/data/").foreach(println)              // full URIs (top level)
synutils.fs.listRecursive("gs://my-bucket/data/").foreach(println)   // full URIs (recursive)

// Read raw bytes
val data: Array[Byte] = synutils.fs.read("gs://my-bucket/config.bin")

// Inspect environment-derived properties
println(synutils.fs.PREFIX)            // "gs://"
println(synutils.fs.getBaseDir())      // "/my-bucket/syn-workspace/workspaces/my-ws"
println(synutils.fs.getLocalTempDir()) // "/tmp"

`spark` — dataset → DataFrame + write helpers (requires Spark)

Method	Purpose
`createDataFrame(datasetName, from_date=None, to_date=None)`	Read dataset into a DataFrame
`isTableExists(dataset)`	True if Hive table exists
`createTable(df, name, partitionedDateColumn="")`	Create Hive table from DataFrame
`writeToEventStore(df, datasetName, ...)`	Write a DataFrame to an event store. Auto-creates the dataset if it doesn't exist.
`writeDatasetToEventStore(df, datasetName)`	Convenience wrapper around `writeToEventStore` for a registered dataset (uses dataset defaults)
`writeFileToEventStore(localPath, eventstorePath)`	Push a local file into the event store

Python

df = synutils.spark.createDataFrame("user_events") 
df.show(5) 
synutils.spark.writeDatasetToEventStore(df, "user_events")

Scala

val df = synutils.spark.createDataFrame("user_events") 
df.show(5) 
synutils.spark.writeDatasetToEventStore(df, "user_events")

`writeToEventStore` — full signature

Writes a DataFrame to a Hive-managed event store table. If the dataset doesn't already exist in the platform's metadata, it is created on the fly using the supplied fileFormat. Partitioning, compression, and process-mode handling are derived from the dataset definition; this method only orchestrates the Spark-side write.

Parameter	Type	Default	Purpose
`df`	DataFrame	—	Source DataFrame to write
`datasetName`	String	—	Dataset name in `database.tablename` format
`numPartitions`	Int	`None` (Py) / `0` (Scala)	If > 0 and the dataset is partitioned, adds `DISTRIBUTE BY <partition_cols>, floor(rand()*numPartitions)` to control output file count per partition
`partitionedDateColumn`	String	`None` (Py) / `""` (Scala)	Override the dataset's configured partition column. If set, the dataset is updated to partition by this column before writing
`isOverwrite`	Boolean	`True`	`True` → `INSERT OVERWRITE TABLE` (replaces partition data); `False` → `INSERT INTO TABLE` (appends)
`overrideProcessMode`	Boolean	`True`	When `True`, recreates / re-initializes the table even if it exists. Set `False` to preserve an existing table definition
`fileFormat`	FileFormat / String	`PARQUET`	Used only when the dataset must be created (404 from the dataset API). Ignored if the dataset already exists. Supported: `PARQUET`, `ORC`, `AVRO`, `DELTA`, `TEXTFILE`

Python

from synutils.file_format
import FileFormat

# Minimal write — dataset must already exist
synutils.spark.writeToEventStore(df, "analytics.user_events") 

# Append (don't overwrite existing partitions) 
synutils.spark.writeToEventStore(df, "analytics.user_events", isOverwrite=False)
 
# Control output file count per partition (e.g. 8 files per partition) 
synutils.spark.writeToEventStore(df, "analytics.user_events", numPartitions=8) 

# Override the partition column for this write only 
synutils.spark.writeToEventStore(df, "analytics.user_events", partitionedDateColumn="event_date") 

# Auto-create as Avro if dataset doesn't exist yet 
synutils.spark.writeToEventStore(df, "analytics.new_avro_dataset", fileFormat=FileFormat.AVRO) 

# Preserve existing table definition (don't recreate) 
synutils.spark.writeToEventStore(df, "analytics.user_events", overrideProcessMode=False)

Scala

import com.syntasa.synutils.FileFormat 

// Minimal write 
synutils.spark.writeToEventStore(df, "analytics.user_events") 

// Append 
synutils.spark.writeToEventStore(df, "analytics.user_events", isOverwrite = false) 

// Control output file count per partition 
synutils.spark.writeToEventStore(df, "analytics.user_events", numPartitions = 8) 

// Override partition column for this write 
synutils.spark.writeToEventStore(df, "analytics.user_events", partitionedDateColumn = "event_date") 

// Auto-create as Avro 
synutils.spark.writeToEventStore(df, "analytics.new_avro_dataset", fileFormat = FileFormat.AVRO.getValue()) 

// Preserve existing table definition 
synutils.spark.writeToEventStore(df, "analytics.user_events", overrideProcessMode = false)

Heads-up: isOverwrite=True overwrites at the partition level for partitioned tables, not the entire table. For non-partitioned tables it overwrites the whole table.

`lib` — install packages and JARs

Python — `installPyPI` / `installCondaPackage`

Method	Purpose
`installPyPI(packages, acrossAllNodes=True)`	pip install. Auto-downloads `s3://`/`gs://`/`https://` URLs, plus `.zip` / `.whl` / `.tar.gz` filenames from the cluster's `<bucket>/<config-folder>/deps/python/` folder. Multiple packages space-delimited.
`installCondaPackage(packages, acrossAllNodes=True)`	conda install (channel `conda-forge`). Multiple packages space-delimited.

Set acrossAllNodes=False to install on the kernel only (skip distribution to Spark worker nodes).

# 1. Regular PyPI package synutils.lib.installPyPI("requests") 

# 2. Multiple packages synutils.lib.installPyPI("requests pandas>=2.0 numpy") 

# 3. Local zip from the cluster's deps/python folder 

# (resolves to <bucket>/<config-folder>/deps/python/simple_module.zip) 
synutils.lib.installPyPI("simple_module.zip") 

# 4. Full cloud path (s3://, gs://, or https://) 
synutils.lib.installPyPI("s3://syntasa-k3s-dev/pradeepm/python_modules/simple_module.zip")

# 5. Mix everything in one call 
synutils.lib.installPyPI("requests simple_module.zip s3://syntasa-k3s-dev/pradeepm/python_modules/other.whl") 

# 6. Kernel-only install (don't distribute to Spark workers) 
synutils.lib.installPyPI("requests", acrossAllNodes=False) 

# 7. conda-forge package 
synutils.lib.installCondaPackage("scipy")

Scala — `installJars`

Install JARs into the running Scala kernel.

Accepts three source styles, mixed freely in a single space-delimited call:

Source	Example
Maven coordinates	`"org.joda:joda-money:1.0.4"`
Filename in the cluster's deps folder (`<bucket>/<config-folder>/deps/jars/`)	`"greeterWithDollar.jar"`
Full cloud-storage path	`"gs://syn-400-development-kub/pradeepm/greeter.jar"`

// Single source synutils.lib.installJars("org.joda:joda-money:1.0.4")
 synutils.lib.installJars("greeterWithDollar.jar") 
 synutils.lib.installJars("gs://syn-400-development-kub/pradeepm/greeter.jar") 
 
 // Multiple, space-delimited 
 synutils.lib.installJars("org.joda:joda-money:1.0.4 greeterWithDollar.jar gs://syn-400-development-kub/pradeepm/greeter.jar")

{[{category.name}]}

connections — connection metadata + decrypted parameters

Python

Scala

Redaction — encrypted fields render as **********

Python

Scala

datasets — dataset registry + Hive table metadata

Python

Scala

infrastructure — platform infra metadata

Properties (read-only attributes)

Section getters

Generic accessors (backward-compat with legacy InfrastructureClient)

Secret fields (encrypted + sealed)

Python

Scala

eventstores — event store path / database resolver

Examples

Python

Scala

notifications — send platform notifications (email)

Python

Scala

credentials — secret store lookup

Basic usage

Redaction — values print as ********** by default

Python

Scala

Unseal — getting the raw value for SDK calls

Python — pass to SDKs

Scala — implicit conversion to String for JDBC / SDKs

files — platform file registry

Methods

createDataFrame parameters

Examples

Python

Scala

fs — direct object-store filesystem (S3 / GCS / Azure / HDFS)

Listing methods — output shape at a glance

Methods

move(src, dest, exclude_folders=None, rewrite_subfolders=None) — full signature

Legacy aliases (Python only)

Examples

Python

Scala

spark — dataset → DataFrame + write helpers (requires Spark)

Python

Scala

writeToEventStore — full signature

Python

Scala

lib — install packages and JARs

Python — installPyPI / installCondaPackage

Scala — installJars

`connections` — connection metadata + decrypted parameters

Redaction — encrypted fields render as `**********`

`datasets` — dataset registry + Hive table metadata

`infrastructure` — platform infra metadata

Generic accessors (backward-compat with legacy `InfrastructureClient`)

`eventstores` — event store path / database resolver

`notifications` — send platform notifications (email)

`credentials` — secret store lookup

Redaction — values print as `**********` by default

`files` — platform file registry

`createDataFrame` parameters

`fs` — direct object-store filesystem (S3 / GCS / Azure / HDFS)

`move(src, dest, exclude_folders=None, rewrite_subfolders=None)` — full signature

`spark` — dataset → DataFrame + write helpers (requires Spark)

`writeToEventStore` — full signature

`lib` — install packages and JARs

Python — `installPyPI` / `installCondaPackage`

Scala — `installJars`