The Add New Only process mode in Syntasa ensures that only new data partitions—those that do not already exist in the output table—are processed and added. Unlike other process modes, it does not overwrite or modify existing date partitions, making it an ideal choice when you only need to append new data without altering previously ingested records.
This mode is particularly useful when working with incremental data loads, ensuring that historical data remains unchanged while new date partitions are seamlessly appended into the output table.
How does it work?
The Add New Only process mode check the date partitions of the incoming data and compares it against the data partitions of existing output table, then it follows following 2 rules:
-
If a date partition already exists in the output table and is also present in the incoming data, the existing data for that partition remains unchanged. Even if the incoming data contains updates for the same partition, they are ignored.
-
If a date partition is not present in the output table but exists in the incoming data, a new date partition is created, and the corresponding data is inserted.
Example Scenario
Let's consider an output table containing data from 1st Jan 2025 to 4th Jan 2025:
Date Partition | Name |
1st Jan 2025 | Joe |
2nd Jan 2025 | Mark |
3rd Jan 2025 | Ricky |
4th Jan 2025 | Alex |
Now, assume the new incoming data contains the following records:
Date Partition | Name |
1st Jan 2025 | John |
2nd Jan 2025 | Elon |
5th Jan 2025 | James |
Job execution:
After executing the job in Add New Only mode for the period 1st Jan 2025 to 5th Jan 2025, the output table will be updated as follows:
Date Partition | Name |
1st Jan 2025 | Joe |
2nd Jan 2025 | Mark |
3rd Jan 2025 | Ricky |
4th Jan 2025 | Alex |
5th Jan 2025 | James |
Explanation of changes:
-
1st Jan 2025 & 2nd Jan 2025: These partitions already exist in the output table, so even though the incoming data contains new records for these dates (John, Elon), they are ignored. These two date partitions in the output table remains unaffected.
-
3rd Jan 2025 & 4th Jan 2025: No data was provided for these date partitions in the incoming data, so they remain unchanged in the output table.
-
5th Jan 2025: This date partition does not exist in the output table but exists in the incoming data, so it is created and populated in the output table.
When to use 'Add New Only' process mode?
This mode is best suited for scenarios where data is appended incrementally, and modifications to existing partitions are not required. Below are common use cases:
-
Incremental Data Processing: When the data source continuously provides new date partitions, and you need to add them without modifying existing ones.
-
Historical Data Preservation: When past records must remain unchanged while only new data is added.
-
Optimized Performance for Large Datasets: Since existing partitions are not rewritten, this mode reduces processing time and storage overhead.
-
Scheduled Data Loads: Ideal for scheduled jobs where only newly available data needs to be ingested without affecting previous partitions.