The BQ Process in Syntasa is primarily used to fetch data from a BigQuery (BQ) table, apply SQL-based transformations, and write the processed results into a new output table.
This article provides a step-by-step guide on how to handle non-partitioned data in a BQ Process — from setting up input connections to configuring and executing the process.
STEP 1: Setting up Input Connection for BQ Process
Let’s assume that the app has already been created. In a typical BigQuery (BQ) process pipeline, the flow of data follows this sequence: Input Connection → BQ Process → Output Table.
The first step in building this pipeline is to set up the input connection. This input connection defines the source from which the BQ Process will read data. Once the input connection is established, it serves as the foundation for executing queries and transforming data within the BQ Process. This input connection can be configured in three ways:
Option 1: Connect Using a BQ Connection
You can create a BQ Connection in Syntasa, which links your project to BigQuery. To learn more about creating a BQ connection, please visit the article BigQuery Connection
When you connect the BQ Process to a BQ Connection, you only need to mention the dataset and table name in your SQL query — the project name is not required because it is already configured in the connection setup.
Example: SELECT * FROM dataset.table;
Here, Syntasa automatically uses the project configured in the BQ connection resource.
Option 2: Connect Using a BQ Table (from another process)
You can also connect the BQ Process to a BQ Table that was generated as an output of another process (for example, a From File process). However, it’s important to ensure that the Load to BQ toggle was enabled when that source process was executed.
If the toggle was off, the output table will not be loaded to BigQuery, and therefore, the BQ Process will not be able to access it.
When using a BQ Table input, you can leverage Syntasa’s built-in parameters like @InputTable1, @OutputTable1 etc., which automatically reference the correct BigQuery table. This eliminates the need to hardcode dataset or table names. To learn more about Parameters, please refer article Parameters in Syntasa Code Processes
Option 3: Connect Using a BQ Table (via Event Store)
You can also connect the BQ Process to a dataset available in the Event Store. If there is any dataset listed under the Event Store that has already been loaded into BigQuery within any Syntasa application, it can be directly used as an input source for your BQ Process. This approach allows you to seamlessly reuse existing, centralized datasets across multiple apps without duplicating data or creating additional connections.
STEP 2: Setting Up BQ Process
Once the input connection is configured, the next step is to add and set up the BQ Process. Drag and drop the BQ Process component onto the canvas, link it with the input connection, and define the SQL logic that will process and transform the data.
For this example, let’s assume that the input is connected using a BQ Connection.
You will now have a data pipeline like this:
BQ Connection → BQ Process → Output BQ TableWhen you open the BQ Process, you will find three tabs under the Configure menu:
- General – Used to specify the display name of the BQ Process within the app.
- Code – The main screen where you define your BigQuery SQL for data transformation. This section also supports dynamic parameters (such as @InputTable1) to make the code reusable and environment-independent.
-
Output –Used to rename the output table that will be stored in BigQuery or to add additional output tables based on your processing needs. You can also set the Partition Scheme to None when creating a non-partitioned output. The default value is Daily.
For this walkthrough, our primary focus will be on the Code tab — this is where we write the SQL query that performs the data transformation.
Below is an example SQL query that creates a non-partitioned output table in the BQ Process when the input source is a BQ Connection:
CREATE TABLE IF NOT EXISTS @OutputTable1 AS
SELECT *
FROM estore_demo.bqdemo
WHERE created_at BETWEEN '@fromDate' AND '@toDate';Explanation:
-
@OutputTable1:Represents the output table defined in the process configuration. You can view or edit this mapping under the Output tab in the Parameters section. -
estore_demo.bqdemo:Refers to the input dataset (estore_demo) and table (bqdemo) from which the data is being read. Since the BQ Process is connected via a BQ Connection, there is no need to specify the project name — it’s already linked through the connection. -
WHERE created_at BETWEEN '@fromDate' AND '@toDate'; Filters the records based on thecreated_atcolumn using the date range parameters @fromDate and @toDate. These parameters are automatically populated at runtime when the job is executed for a specific date range. Example: If the job is scheduled for the range 1st January 2025 to 3rd January 2025,
then:@fromDate= 2025-01-01@toDate= 2025-01-03
and the query will automatically filter data within this range.
When the input source is BQ table, the query can be written as:
CREATE TABLE IF NOT EXISTS @OutputTable1 AS
SELECT *
FROM @InputTable1
WHERE created_at BETWEEN '@fromDate' AND '@toDate';Explanation:
The code remains the same, with the only difference being the addition of the parameter @InputTable1, which represents the input BigQuery table defined in the process configuration. You can view this mapping under the Input tab in the Parameters section.
STEP 3: Understanding BQ Process Type
In the Parameters section, you will find a dropdown named BQ Process Type, which offers two options: Overwrite and Append. By default, the value is set to Overwrite. Each option serves a distinct purpose:
- Overwrite – Select this option when the target table already contains data and you want to replace the existing data with new results.
- Append – Select this option when you want to retain the existing data in the table and add new records to it.
Note: The BQ Process Type dropdown shown under the Parameters section of BQ Process is applicable only for creating non-partitioned output tables. It does not apply when creating partitioned outputs. This dropdown doesn't have any relevance when Drop and Replace process mode is used to execute the process.
STEP 4: Execute the BQ Process
Once the BQ Process has been configured, it is ready for execution. You can either create a new job or use the Interactive Mode to run the process in a development environment for testing and validation.
In a production setup, you can also schedule the job as part of a workflow to automate data processing at defined intervals.
Note: When working with a non-partitioned output (i.e., when the partition scheme is set to None), the process modes Add New Only and Add New and Replace Modified do not appear for the BQ Process, as they are not applicable.
If the partition scheme is set to Daily, but the SQL query does not actually generate partitioned output, then Add New Only and Add New and Replace Modified behave the same as Replace Date Range Only, since the process cannot identify partitions to apply the advanced modes correctly.
To learn more about running and managing jobs, please refer to the following articles:
Creating a Job
Scheduling a Job
Interactive Mode
Process Modes
Step 5: Checking the Output
After the BQ Process has executed successfully, you can verify the results by previewing the output data.
Click on the Output node of the BQ Process to open the Preview tab. This tab displays the first 100 records of the output table, allowing you to quickly confirm that the transformation and data loading have been completed as expected.
For a more detailed view or to validate data at scale, you can also open the output table directly in BigQuery Console and run custom queries.
This walkthrough demonstrated how to create a non-partitioned output table in a BQ Process by transforming data from an existing BigQuery table. By leveraging Syntasa’s built-in parameters, such as @InputTable1, @OutputTable1, @fromDate, and @toDate, you can make your SQL code more dynamic, reusable, and environment-independent.
This approach not only streamlines data processing within Syntasa but also ensures that your pipelines remain flexible and easily maintainable across different projects and environments.