In this article, we will walk you through the complete process of exporting data from Syntasa’s environment to an external database using the TO DB process. This is useful when you have performed data transformations and want to push the final dataset into a database such as Postgres, Snowflake, or Athena for further reporting or operational use.
You’ll learn how to:
- Set up the workflow using Event Store, TO DB, and a Database Connection
- Configure the Event Store, TO DB process and a Database connection
- Understand how partitioned data works with TO DB
- Execute the job and verify the data in your external database
Prerequisites
- An Event Store node populated with the data you wish to export.
- A configured database connection like Postgres in your Syntasa environment.
- An app with blank development workflow canvas
Steps To Export Data to External Database via To DB Process
Once you open a newly created app, you will be taken to the development workflow canvas. You can perform below steps to perform end to end workflow:
-
Configure the Input Source
The input source for the TO DB process must be an Event Store. This Event Store can come from various upstream processors in your workflow—such as Spark Processor, From DB, or any other process that outputs data within the same application.
If you want to use data from a different application as input, you can do so by dragging a new Event Store node from the left-side palette and placing it onto the workflow canvas.
To configure it:
- Click on the Event Store node you just added.
- A configuration panel will open on the right-hand side.
- Select the appropriate Event Store and choose the dataset associated with it. Please note development datasets can not be deployed to production.
- Once the dataset is selected, click the tick mark (✓) to apply and close the panel.
- Your Event Store is now configured and ready to serve as the input for the TO DB process.
-
Configure the Database Connection
Once the Event Store is configured as the input, the next step is to add and configure the database connection where the data will be exported. This connection points to an external database, and the TO DB process will write data into a table within that database.
Syntasa supports a variety of external databases including Postgres, Snowflake, Amazon Athena, Google BigQuery, and others. For this guide, we’ll use a Postgres connection as an example.
To configure the output connection:
- Drag and drop the Database connection node from the left-hand palette onto the workflow canvas.
- Click the database node to open its configuration panel on the right-hand side.
- From the dropdown, select the Postgres connection that was already created as part of your prerequisites.
- Once selected, click the tick mark (✓) to save the configuration
-
Add TO DB Process
With both the input (Event Store) and output (database) nodes in place, it’s time to add the TO DB process, which will export the data to destination. Please follow these steps:
- Drag and drop the TO DB node from the left-side palette onto the canvas.
- Connect the three components in sequence: Event Store (Input Source) → TO DB(Process) → Postgres(Database Connection). This creates a complete data pipeline that reads data from the Event Store, processes it through the TO DB process, and export the result to the Postgres table.
-
Click the TO DB node to begin configuring it.
-
Understanding Incremental Load in TO DB
The Incremental Load toggle in the TO DB process controls how data is exported from the input source to the external database, particularly when working with partitioned data. There are 3 scenarios:
-
Export All Partitioned Data
If your input source contains partitioned data and you want to export all available partitions, you should:
- Keep the Incremental Load toggle OFF
- During job execution, the execution date range will be ignored
- All partitions from the input source will be exported to the connected database
- Example: If the input contains data for 1st Jan to 5th Jan 2025, and you wish to export the full dataset, simply keep Incremental Load OFF, and execute the job. All five days of data will be sent to the external database.
-
Export Data for a Specific Date Partition(s)
If your requirement is to export data only for a specific subset of dates, you can achieve that by:
- Keeping the Incremental Load toggle ON
- Selecting the appropriate execution date range when creating or running the job
- Example: To export only the data for 1st Jan to 3rd Jan 2025, keep Incremental Load ON, and select this date range during job execution. Only the data from those selected dates will be exported.
-
When Input is Non-Partitioned
If your input Event Store is not partitioned, you should:
- Keep the Incremental Load toggle OFF
- The entire dataset will be exported, and the execution date range will have no effect
This configuration allows you to flexibly choose between exporting all data or filtering it by date, based on how your input is organized.
-
-
Configure To DB Process
Now that your data pipeline is connected and you understand the role of the Incremental Load toggle, the next step is to configure the TO DB process to define how your data should be exported to the external database.
- Click on the TO DB node on the workflow canvas. This will open the configuration panel, where you can specify various settings.
- In the General tab, you can optionally assign a meaningful name to the TO DB process. This is useful for better identification, especially when working with complex or large workflows.
-
Navigate to the Input tab within the configuration panel. Here, you’ll enter the table name where you want the data to be exported in the connected database.
- If the specified table does not exist, it will be automatically created during job execution.
- If the table already exists, the data will be appended or overwritten based on the process mode selected at the time of job creation.
- The Incremental Load toggle is also available in this section. Based on your data requirements—whether exporting all data or only data for a specific date range—you can enable or disable it accordingly. (For more information, refer to the previous section on Incremental Load.)
- The Mapping screen allows you to choose which columns to include in the exported data, rename them, or reorder them. To understand the full set of features available in this screen, see the article: [Mapping columns in To DB Process]
-
Once all fields and options have been configured:
- Click the tick mark (✓) at the top of the configuration panel to save the TO DB process.
- Then click Save and Lock on the main toolbar to save the entire workflow.
Your data pipeline is now fully configured and ready for execution. In the next step, you’ll run the job and review the exported data in the configured database.
-
Run the To DB process
Once the TO DB process is fully configured and your workflow is saved, the final step is to execute the process to export the data to the configured database. This is done by creating and running a job.
-
To begin, you need to create a job for your workflow. If you’re not familiar with this step, refer to the article: Creating a Job for detailed guidance.
Unlike other application processes, TO DB’s purpose is to export the data to the configured database—it doesn’t perform checks or updates on the configured database—so the process mode like “Add New" and “Add New and Replace Modified” are not applicable for the To DB process.
- Once the job is created, go ahead and execute it. For more details on how to run a job, see: Executing a Job.
- After successful execution, navigate to the Query editor and run the query to select all records from the table . You will find all the data with the columns you added under Mapping Columns in 'To DB' Process Screen.
-
-
Deploy The Code to Production
To run the execution on the production workflow, simply deploy the app from development to production, create a production job, and execute it. For more details on development vs production workflows and the deployment process, refer to the following articles: