Handling Partitioned Data with “Code Managed” Process Mode – SYNTASA™

In a standard Syntasa workflow, the platform’s State Service automatically manages the lifecycle of partitioned tables. When you select a mode like Replace Date Range, the platform identifies which partitions exist in the target, drops them for the specified window, and writes new data.
However, complex data engineering patterns—such as Delta Lake Merges, SCD Type 2 logic, or Late-Arriving Data handling—often require more surgical control than standard modes allow. The Code Managed process mode (introduced in SMA-11472) shifts the responsibility of partition management from the Syntasa orchestration engine directly to your Spark or Python code.

How “Code Managed” Changes Partition Logic

When a job step is configured as Code Managed, the platform alters its interaction with the data plane in two fundamental ways:

Suspension of Automated Partition Cleanup

In standard modes, Syntasa executes “Pre-Processing” steps to clear out existing data in the target partitions to prevent duplication.

In Code Managed Mode: Syntasa will not issue any DROP PARTITION or DELETE commands. Your code is responsible for ensuring that the target table remains consistent and that you are not inadvertently doubling records.

Access to the Full Date Context

Standard modes filter the input data to only show “new” or “modified” dates.

In Code Managed Mode: The platform provides your code with the full date range requested for the job. This allows your logic to scan historical partitions, perform lookups across a wider window, or decide internally which specific partitions need to be updated based on the data content rather than the platform’s metadata.

Implementation Patterns

When using Code Managed mode with partitioned data, you should adopt one of the following patterns in your Notebook or Code step:

Pattern A: The Delta Lake Merge (Recommended)

Instead of dropping and replacing entire partitions, use the MERGE command to update existing records and insert new ones. This is the most common reason to use Code Managed mode.

# Example: Merging updates into a partitioned Delta table
from delta.tables import *
target_table = DeltaTable.forPath(spark, "s3://my-bucket/gold_table")

target_table.alias("target").merge(
    source_df.alias("source"),
    "target.id = source.id AND target.event_date = source.event_date"
).whenMatchedUpdateAll() \
 .whenNotMatchedInsertAll() \
 .execute()

Pattern B: Manual Partition Overwrite

If you are using standard Parquet tables but need to control exactly when a partition is replaced (e.g., only if a certain threshold of data is met), you must handle the overwrite explicitly.

// Example: Manual partition overwrite in Scala
df.write
  .mode("overwrite")
  .option("replaceWhere", s"event_date >= '$fromDate' AND event_date <= '$toDate'")
  .saveAsTable("my_partitioned_table")

Critical Considerations

The “Always-Run” Nature

Because Code Managed mode bypasses the platform’s “Skip Check,” the step will execute even if no new files have arrived in the source. If your partition logic is expensive (e.g., scanning a massive historical archive), ensure you have internal checks in your code to exit early if there is no work to be done.

Idempotency is Mandatory

Since Syntasa is no longer managing the “State” of what has been processed for this step, your code must be idempotent. Running the same job multiple times with the same parameters should result in the same final state in the partitioned table.

Metadata Synchronization

If your code creates new partitions manually (e.g., writing directly to an S3 path), you may need to issue a MSCK REPAIR TABLE or ALTER TABLE ADD PARTITION command within your code to ensure the Glue Catalog or Hive Metastore is aware of the new data. Standard Syntasa modes do this automatically; Code Managed mode does not.

Summary: When to Choose Code Managed for Partitions

Use Case	Recommended Mode
Standard daily/hourly batch append	Add New Only
Correcting a specific date range of data	Replace Date Range
Using Delta Lake MERGE or UPDATE	Code Managed
Complex logic involving multiple target tables	Code Managed
Custom file-naming or sub-partitioning logic	Code Managed

{[{category.name}]}