In data science workflows, especially in time-partitioned datasets, managing data updates with precision is critical. Syntasa’s Replace Data Range process mode provides a reliable and controlled method to update output tables based on a specified execution date range.
This process mode is designed to ensure freshness of data for a given range, without disturbing historical data outside the range. It intelligently replaces, preserves, or add new data partitions depending on whether they exist in the input source, the output table, or both.
How 'Replace Data Range' Process Mode Works?
The Replace Data Range process mode evaluates each date partition in relation to the input source, the output table, and the specified execution date range. The behavior falls into the following four cases:
- Overwrite Existing Partitions
If a date partition is present in both the input and the output table, and it lies within the execution date range, that partition in the output is overwritten, even if the data is identical. You can refer Example 1 below for better understanding. -
Add New Partitions
If a date partition is found in the input source but not in the output table, and it lies within the execution date range, it is added as a new partition in the output. You can refer Example 1 below for better understanding -
Retain Partition Missing in Input Source
If a date partition is found in the output table but not in the input source, and it lies within the execution date range, it is left untouched in the output table. You can refer Example 2 below for better understanding - Retain Partition Outside Job Execution Range
Any date partition in the output table that lies outside the specified execution date range is not affected at all, regardless of its presence or absence in the input. You can refer Example 3 below for better understanding
Examples
Let's break down three practical examples with detailed before-and-after tables. For simplicity, assume the execution timestamp is 7-Apr-2025 10:00 AM
in all examples.
Example 1
This example demonstrates basic behavior where partitions are either replaced (if already present in the output table) or added (if missing), all within the specified execution date range.
Refer to the data below to understand this example:
- Existing Output Table (A) currently includes partitions for: 2nd Jan – 5th Jan 2025
- New Input source (B) contains data for the date range: 1st Jan – 9th Jan 2025
- Job Execution Date Range is defined as: 1st Jan – 7th Jan 2025
Existing Output Table (A)
Date Partition | Data | Modified Timestamp |
2nd Jan 2025 | chrome | 1-Apr-2025 09:00 AM |
3rd Jan 2025 | edge | 1-Apr-2025 09:00 AM |
4th Jan 2025 | firefox | 1-Apr-2025 09:00 AM |
5th Jan 2025 | safari | 1-Apr-2025 09:00 AM |
Input Source (B)
Date Partition | Data |
1st Jan 2025 | opera |
2nd Jan 2025 | edge |
3rd Jan 2025 | edge |
4th Jan 2025 | safari |
5th Jan 2025 | safari |
6th Jan 2025 | firefox |
7th Jan 2025 | firefox |
8th Jan 2025 | firefox |
9th Jan 2025 | chrome |
Resulting Output Table After Execution
Date Partition | Data | Modified Timestamp | Remarks |
1st Jan 2025 | opera | 7-Apr-2025 10:00 AM |
➕ New Partition |
2nd Jan 2025 | edge | 7-Apr-2025 10:00 AM |
🔁 Replaced |
3rd Jan 2025 | edge | 7-Apr-2025 10:00 AM |
🔁 Replaced (even with identical data) |
4th Jan 2025 | safari | 7-Apr-2025 10:00 AM |
🔁 Replaced |
5th Jan 2025 | safari | 7-Apr-2025 10:00 AM |
🔁 Replaced (even with identical data) |
6th Jan 2025 | firefox | 7-Apr-2025 10:00 AM |
➕ New Partition |
7th Jan 2025 | firefox | 7-Apr-2025 10:00 AM |
➕ New Partition |
Note: The Remarks column is provided for explanation purposes only and is not part of the actual output table. It is included here solely to help illustrate how each partition is handled during execution.
Example 2
It shows replacement of partitions within the execution date range, and also illustrates how existing partitions outside the execution range remain unchanged.
Refer to the data below to understand this example:
- Existing Output Table (A) currently includes partitions for: 2nd Jan – 5th Jan 2025
- New Input source (B) contains data for the date range: 1st Jan – 9th Jan 2025
- Job Execution Date Range is defined as: 3rd Jan – 7th Jan 2025
Existing Output Table (A)
Date Partition | Data | Modified Timestamp |
2nd Jan 2025 | chrome | 1-Apr-2025 09:00 AM |
3rd Jan 2025 | edge | 1-Apr-2025 09:00 AM |
4th Jan 2025 | firefox | 1-Apr-2025 09:00 AM |
5th Jan 2025 | safari | 1-Apr-2025 09:00 AM |
Input Source (B)
Date Partition | Data |
1st Jan 2025 | opera |
2nd Jan 2025 | edge |
3rd Jan 2025 | edge |
4th Jan 2025 | safari |
5th Jan 2025 | safari |
6th Jan 2025 | firefox |
7th Jan 2025 | firefox |
8th Jan 2025 | firefox |
9th Jan 2025 | chrome |
Resulting Output Table After Execution
Date Partition | Data | Modified Timestamp | Remarks |
2nd Jan 2025 | chrome | 1-Apr-2025 09:00 AM |
✅ Unchanged — outside execution date range |
3rd Jan 2025 | edge | 7-Apr-2025 10:00 AM |
🔁 Replaced (even with identical data) |
4th Jan 2025 | safari | 7-Apr-2025 10:00 AM |
🔁 Replaced |
5th Jan 2025 | safari | 7-Apr-2025 10:00 AM |
🔁 Replaced (even with identical data) |
6th Jan 2025 | firefox | 7-Apr-2025 10:00 AM |
➕ New Partition |
7th Jan 2025 | firefox | 7-Apr-2025 10:00 AM |
➕ New Partition |
Note: The Remarks column is provided for explanation purposes only and is not part of the actual output table. It is included here solely to help illustrate how each partition is handled during execution.
Example 3
This example covers replacement of partitions within the execution range, along with a case where a partition exists in the output table but is missing in the input source — it remains untouched, even though it's within the execution date range.
Refer to the data below to understand this example:
- Existing Output Table (A) currently includes partitions for: 2nd Jan – 5th Jan 2025
- New Input source (B) contains data for the date range: 3rd Jan – 9th Jan 2025
- Job Execution Date Range is defined as: 1st Jan – 7th Jan 2025
Existing Output Table (A)
Date Partition | Data | Modified Timestamp |
2nd Jan 2025 | chrome | 1-Apr-2025 09:00 AM |
3rd Jan 2025 | edge | 1-Apr-2025 09:00 AM |
4th Jan 2025 | firefox | 1-Apr-2025 09:00 AM |
5th Jan 2025 | safari | 1-Apr-2025 09:00 AM |
Input Source (B)
Date Partition | Data |
3rd Jan 2025 | edge |
4th Jan 2025 | safari |
5th Jan 2025 | safari |
6th Jan 2025 | firefox |
7th Jan 2025 | firefox |
8th Jan 2025 | firefox |
9th Jan 2025 | chrome |
Resulting Output Table After Execution
Date Partition | Data | Modified Timestamp | Remarks |
2nd Jan 2025 | chrome | 1-Apr-2025 09:00 AM |
✅ Unchanged — date partition not found in input source |
3rd Jan 2025 | edge | 7-Apr-2025 10:00 AM |
🔁 Replaced (even with identical data) |
4th Jan 2025 | safari | 7-Apr-2025 10:00 AM |
🔁 Replaced |
5th Jan 2025 | safari | 7-Apr-2025 10:00 AM |
🔁 Replaced (even with identical data) |
6th Jan 2025 | firefox | 7-Apr-2025 10:00 AM |
➕ New Partition |
7th Jan 2025 | firefox | 7-Apr-2025 10:00 AM |
➕ New Partition |
Note:
1st Jan 2025 record is not counted as it was not part of the job execution date range.
The Remarks column is provided for explanation purposes only and is not part of the actual output table. It is included here solely to help illustrate how each partition is handled during execution.
Important Points
- Data Replacement Happens at the Partition Level, Not Row Level:
- The system does not check for modifications in individual records within a date partition.
- If data exists for a partition, the entire partition is replaced rather than modifying specific rows.
- Ensure That a Proper Date Range Is Selected:
- Since data outside the selected date range is not affected, ensure that the correct range is chosen to avoid unintended gaps in processing. For example, if the data already exists for 1st Jan to 4th Jan 2025 in output table and you run the job for the 1st Jan to 3rd Jan 2025, the data of 4th Jan 2025 will remain unaffected.
When to Use Replace Date Range?
The Replace Date Range process mode is recommended in the following scenarios:
-
Development Workflows (When Schema Structure is Stable):
-
Suitable when testing and refining data within a specific period while keeping the table structure unchanged.
-
Allows reprocessing data for selected date ranges without affecting other periods.
-
-
Production Workflows (Manual Processing for Complete Data Replacement):
-
Can be used in production for manual runs when a complete replacement of data is required for a specific date range.
-
Helps in situations where updates or corrections to a limited time frame are needed.
-
-
Historical Data Reprocessing:
-
Useful when customers or stakeholders request reprocessing of past data due to changes in business logic, data sources, or transformation rules.
-
Ensures that outdated or incorrect data for specific periods is replaced with newly processed results.
-
-
When Data Outside the Processing Date Range Should Be Preserved:
-
Unlike Drop & Replace, which clears the entire table, this mode ensures that records outside the specified date range remain unchanged.
-