In this article, we’ll walk through a complete step-by-step guide to configure and execute the TO FILE process in Syntasa, which allows you to export data from the Syntasa environment into external cloud storage in the form of CSV or text files.
Whether your input data is partitioned by date or non-partitioned, this guide will show you how to handle both scenarios correctly. You will learn how to set up an end-to-end data pipeline using components like the Event Store, TO FILE process, and GCS or S3 connection. We'll also explain how to use the Incremental Load toggle to control which portions of the data get exported based on the job’s execution date range.
By the end of this guide, you'll know how to:
- Build and connect the required nodes in your workflow canvas.
- Choose between Standard (CSV) or Key-Value (text) output formats.
- Configure TO FILE settings for partitioned and non-partitioned data.
- Execute the job and generate the file(s) in your cloud storage.
Prerequisites
- An Event Store node populated with the data you wish to export.
- A configured cloud Storage connection, like GCS (Google Cloud Storage) or S3 connection in your Syntasa environment.
- An app with a blank development workflow canvas
Steps To Generate a File via the TO FILE process
Once you open a newly created app, you will be taken to the development workflow canvas. You can perform below steps to perform end to end workflow:
-
Configure the Input Source
The input source for the TO FILE process must be an Event Store. This Event Store can come from various upstream processors in your workflow—such as Spark Processor, From DB, or any other process that outputs data within the same application.
If you want to use data from a different application as input, you can do so by dragging a new Event Store node from the left-side palette and placing it onto the workflow canvas.
To configure it:
- Click on the Event Store node you just added.
- A configuration panel will open on the right-hand side.
- Select the appropriate Event Store and choose the dataset associated with it. Please note development datasets can not be deployed to production.
- Once the dataset is selected, click the tick mark (✓) to apply and close the panel.
- Your Event Store is now configured and ready to serve as the input for the TO FILE process.
-
Configure the Output Connection
Once the Event Store is configured as the input, the next step is to add and configure the cloud storage connection where the output file will be saved. This connection points to a cloud bucket, and the file will be created inside this bucket.
Syntasa supports various file-based storage systems like GCS (Google Cloud Storage), Amazon S3, and Azure Blob Storage. For this guide, we’ll use a GCS connection as an example.
To configure the output connection:
- Drag and drop the GCS connection node from the left-hand palette onto the workflow canvas.
- Click the GCS node to open its configuration panel on the right side.
- From the dropdown, select the GCS connection that was already created as part of your prerequisites.
- Once selected, click the tick mark (✓) to save the configuration.
-
Add TO FILE Process
With both the input (Event Store) and output (GCS) nodes in place, it’s time to add the TO FILE process, which will generate the file. Please follow these steps:- Drag and drop the TO FILE node from the left-side palette onto the canvas.
- Connect the three components in sequence: Event Store (Input Source) → TO FILE (Process) → GCS (Cloud Connection). This creates a complete data pipeline that reads data from the Event Store, processes it through the TO FILE process, and writes the result to the cloud
- Click the TO FILE node to begin configuring it.
-
Understanding the Incremental Load Toggle
Before configuring the TO FILE process, it’s important to understand the Incremental Load toggle, as it plays a key role when handling partitioned or non-partitioned data. This setting determines how the input data is filtered and exported, especially in workflows where data is divided into partitions—commonly by date or ID.
Depending on whether your input data is partitioned and whether you want your output to be partitioned, the Incremental Load toggle should be set accordingly. Below are the different scenarios that explain how this setting affects the TO FILE process:
-
Non-Partitioned Input → Non-Partitioned Output
- Keep Incremental Load OFF
- In this case, your input data is not divided into partitions (such as by date).
- When Incremental Load is OFF, the TO FILE process exports all available records in the Input Source
- The job’s execution date range does not affect the data output.
- You can select any date range when executing, and the result will always be the full dataset.
-
Partitioned Input → Non-Partitioned Output
- Keep Incremental Load OFF
- Your input data is partitioned, but you want to combine everything into a single output file (or files, based on max split size).
- In this setup, TO FILE will read all partitions, regardless of the execution date range.
- The data will be merged into a consolidated output without filtering by date.
-
Partitioned Input → Partitioned Output
- Turn Incremental Load ON
- Here, both your input data and your intended output are partitioned (e.g., by date).
- When Incremental Load is ON, TO FILE will only read and export data from partitions that fall within the execution date range specified when running the job.
- This is ideal when you want to create output files for a specific range of dates, such as daily exports or scheduled batch uploads.
-
-
Configure To File Process
Now that your data pipeline is connected and you have understood the Incremental Toggle role, the next step is to configure the TO FILE process to control how your output file will be generated.
- Click on the TO FILE node on the canvas. This will open the configuration panel where you can define how your data should be exported.
- In the General tab, you can give your TO FILE process a meaningful name. This is optional, but helpful when managing large workflows.
- Under the Input tab, you will choose the desired Output Format: Standard or Key-Value.
- Once you choose the format, the related configuration fields will appear.
- If you select Standard Output Format, you will see fields like delimiter, header toggle etc. To learn how to correctly fill in these fields, refer to the article: Understanding Parameters for Generating Standard Output File.
- If you select Key-Value Output Format, you will be presented with fields like key-value delimiters, array delimiters, and more. To understand how to configure these settings properly, see the article: Understanding Parameters for Generating Key-Value Output File
- If you want your output file name to include dynamic values such as row count, date, or partition number, you can use parameters like
{@rows},{@date},{@split}etc. To learn more about these parameters and how they work, visit the article: Parameters in TO FILE process. - The Mapping screen allows you to choose which columns to include in the output file, rename them, or reorder them (for Standard format). To understand the full set of features available in this screen, see the article: Mapping Columns in 'To File' Process
- Once all fields and options have been configured:
- Click the tick mark (✓) at the top of the configuration panel to save the TO FILE process.
- Then click Save and Lock on the main toolbar to save the entire workflow.
Your data pipeline is now fully configured and ready for execution. In the next step, you’ll run the job and review the output file generated in your connected cloud storage.
-
Run the To FILE process
Once the TO FILE process is fully configured and your workflow is saved, the final step is to execute the process to generate the output file. This is done by creating and running a job.
-
To begin, you need to create a job for your workflow. If you’re not familiar with this step, refer to the article: Creating a Job for detailed guidance.
The process mode (like “Add New,” “Add New and Replace Modified”) selected during job creation does not affect the TO FILE process. Unlike other application processes, TO FILE’s purpose is to generate files from the input data—it doesn’t perform checks or updates on existing data—so it ignores the process mode.
-
Once the job is created, go ahead and execute it. For more details on how to run a job, see: Executing a Job.
-
After successful execution, navigate to the cloud storage path you specified in the TO FILE configuration (under the Parameter screen). You will find the generated file(s) stored there, named according to the pattern and parameters you defined.
-