The Add New Only process mode in Syntasa ensures that only new data partitions—those that do not already exist in the output table—are processed and added. Unlike other process modes, it does not overwrite or modify existing date partitions, making it an ideal choice when you only need to add new date partition without altering previously ingested records.
This mode is particularly useful when working with incremental data loads, ensuring that historical data remains unchanged while new date partitions are seamlessly appended into the output table.
How does it work?
The Add New Only process mode check the date partitions of the incoming data and compares it against the data partitions of existing output table, then it follows following 2 rules:
-
If a date partition already exists in the output table and is also present in the incoming data, and it lies within the execution date range, the existing data for that partition remains unchanged. Even if the input source contains updates for the same partition, they are ignored.
-
If a date partition is not present in the output table but exists in the input source, and it lies within the execution date range, a new date partition is created, and the corresponding data is inserted.
Example
In the example below, we explain the scenarios in which data under a date partition in the output table is preserved or newly added. Refer to the data below to understand this example:
- Existing Output Table (A) currently includes partitions for: 1st Jan – 4th Jan 2025
- New Input source (B) contains data for the date range: 1st Jan, 2nd Jan and 5th Jan 2025
- Job Execution Date Range is defined as: 1st Jan – 5th Jan 2025
- Job Execution Timestamp: January 5th, 2025 17:15:00
Let's consider an output table(A) containing data from 1st Jan 2025 to 4th Jan 2025:
| Date Partition | Name | Modified Timestamp |
| 1st Jan 2025 | Joe | 4th Jan 2025 19:22:22 |
| 2nd Jan 2025 | Mark | 4th Jan 2025 19:22:22 |
| 3rd Jan 2025 | Ricky | 4th Jan 2025 19:22:22 |
| 4th Jan 2025 | Alex | 4th Jan 2025 19:22:22 |
Now, assume the new input source data (B)contains the following records now:
| Date Partition | Name |
| 1st Jan 2025 | John |
| 2nd Jan 2025 | Elon |
| 5th Jan 2025 | James |
Job execution:
After executing the job in Add New Only mode for the period 1st Jan 2025 to 5th Jan 2025, the output table will be updated as follows:
| Date Partition | Name | Modified Timestamp |
| 1st Jan 2025 | Joe | 4th Jan 2025 19:22:22 |
| 2nd Jan 2025 | Mark | 4th Jan 2025 19:22:22 |
| 3rd Jan 2025 | Ricky | 4th Jan 2025 19:22:22 |
| 4th Jan 2025 | Alex | 4th Jan 2025 19:22:22 |
| 5th Jan 2025 | James | 5th Jan 2025 17:15:00 |
Explanation of changes:
-
1st Jan 2025 & 2nd Jan 2025: These partitions already exist in the output table, so even though the input source contains new records for these dates (John, Elon), they are ignored. These two date partitions in the output table remains unaffected.
-
3rd Jan 2025 & 4th Jan 2025: No data was provided for these date partitions in the incoming data, so they remain unchanged in the output table.
-
5th Jan 2025: This date partition does not exist in the output table but exists in the input source, so it is created and populated in the output table.
Since the partitions from January 1st to 4th January remain unaffected, their modified timestamps in the output table stay unchanged. Only the modified timestamp for January 5th is updated, as it is a newly added partition.
When To Use 'Add New Only' Process Mode?
This mode is best suited for scenarios where data is appended incrementally, and modifications to existing partitions are not required. Below are common use cases:
-
Incremental Data Processing: When the data source continuously provides new date partitions, and you need to add them without modifying existing ones.
-
Historical Data Preservation: When past records must remain unchanged while only new data is added.
-
Optimized Performance for Large Datasets: Since existing partitions are not rewritten, this mode reduces processing time and storage overhead.
-
Scheduled Data Loads: Ideal for scheduled jobs where only newly available data needs to be ingested without affecting previous partitions.