In the previous article, we learned how to configure the input and schema screen. After configuring these screens, we can also configure the output screen to meet our requirements.
The Output screen has many configurable options. It grants users the capability to assign names to tables and display names on the workflow canvas. Additionally, it allows the selection of the destination for data loading: BigQuery (BQ) if using Google Cloud Platform (GCP), Redshift or RDS if utilizing Amazon Web Services (AWS), or direct writing to HDFS for on-premise Hadoop environments.
On the top of the Output screen, you will see two details:
- Eventstore Name - This is the name of the Event Store selected when the app was initially created. This option is not configurable. If any details regarding the Event Store Name, Database, or Location are incorrect, then necessary changes are required in the Event Stores settings screen. You can check all the tables of development and production databases under the Event store by navigating to Resources>>Datastores
- Database - It is the name of the database in the Event Store where the data will be written.
Dataset
In the From File process, the output is typically connected to a single output node, meaning you will find only one dataset under the Dataset section. However, in other application processes, the output may generate multiple datasets. This section allows users to configure each dataset individually, including renaming the table name, enabling compression to reduce storage usage, loading data to an external source (BigQuery, Redshift, etc.), locating the storage path, etc. Let’s explore these configuration options in detail.
Table Name
Specifies the name of the database table where the output data will be stored. Ensure the table name is unique within the defined Event Store to prevent data from being overwritten by another process.
If you change the table name, a new table with the updated name will be created, while the previous table will remain unchanged in the database.
Display Name
Represents the label of the output node connected to the From File process, as displayed on the app workflow canvas. This allows you to assign a custom name to the output node for better clarity in the workflow.
Load To BQ/Redshift/Memory
Regardless of the environment in which your application is hosted, Syntasa first stores all output data in Hive tables. This ensures that the data is structured and accessible before it is loaded into any external system. Syntasa provides the option to load Hive output tables into an external data source based on the environment where the application is running. To enable this, a toggle switch is displayed, allowing users to choose whether the data should be loaded into BigQuery (BQ), Amazon Redshift, or stored in memory.
Here is the list of toggle shows as per the environment:
- GCP Environment - Load to BQ (BigQuery)
- If this toggle is enabled, the Hive output table is loaded into BigQuery within the GCP project.
- The Project ID where the data will be stored is configured under Admin Center → Infrastructure.
- If this toggle is disabled, the data remains stored only in Hive tables
Compression
The Compression option reduces storage space by compressing output files. While this may add some processing time, it is beneficial for files that need to be stored long-term. If you plan to retain raw files indefinitely, enabling the Compression toggle is recommended.
- If Compression is enabled, the output files are compressed to save storage space. If Compression is disabled, the files remain in their uncompressed form.
- If the input file is already compressed, but compression is disabled for the output, the system will uncompress the input file for processing but will not re-compress the output.
- To maintain compression throughout the process, ensure the Compression toggle is enabled for the output as well.
Location
The output data is stored in a storage bucket (for cloud environments) or an on-premise location, making it accessible for further use.
This storage path can be updated under the Event Store settings. However, once tables have been generated in the database, the storage path can no longer be modified.