The BQ Process in Syntasa is primarily used to fetch data from a BigQuery (BQ) table, apply SQL-based transformations, and write the processed results into a new output table.
This article provides a step-by-step guide on how to handle partitioned data in a BQ Process — from setting up input connections to configuring and executing the process.
Creating Partitioned Output on Daily Basis
STEP 1: Setting up Input Connection for BQ Process
The steps for setting up the input connection in a BQ Process for generating a partitioned output remain the same as described in the article “Creating Partitioned Output in BQ Process.”
Now, let’s move to the next step i.e. Setting Up the BQ Process.
STEP 2: Setting Up BQ Process
Once the input connection is configured, the next step is to add and set up the BQ Process. Drag and drop the BQ Process component onto the canvas, link it with the input connection, and define the SQL logic that will process and transform the data.
For this example, let’s assume that the input is connected using a BQ Connection.
You will now have a data pipeline like this:
BQ Connection → BQ Process → Output BQ TableWhen you open the BQ Process, you will find three tabs under the Configure menu:
- General – Used to specify the display name of the BQ Process within the app.
- Code – The main screen where you define your BigQuery SQL for data transformation. This section also supports dynamic parameters (such as @InputTable1) to make the code reusable and environment-independent.
- Output –Used to rename the output table that will be stored in BigQuery or to add additional output tables based on your processing needs. You can also set the Partition Scheme to Daily when creating a non-partitioned output. The default value of Partition Scheme for BQ process remains Daily.
Below is an example SQL query that creates a partitioned output table in the BQ Process when the input source is a BQ Connection:
CREATE TABLE IF NOT EXISTS @OutputTable1(
id INTEGER,
country STRING,
city STRING,
login_date_time STRING,
created_at DATE
)
PARTITION BY created_at;
--END--
SELECT *
FROM estore_demo.bqdemo
WHERE created_at Between '@fromDate' AND '@toDate';
Explanation of above query:
@OutputTable1:
Represents the output table defined in the process configuration. You can view or edit this mapping under the Output tab in the Parameters section. This table is created only if it does not already exist.-
Table Schema:
Defines the structure of the output table with the following columns:id– Integer column representing the unique identifier of each record.country– Name of the country stored as a string.city– Name of the city stored as a string.login_date_time– Login timestamp stored as a string.created_at– Date field used to partition the output table.
PARTITION BY created_at:
Specifies that the output table is partitioned by thecreated_atcolumn. This means the data will be physically divided by date, improving query performance and reducing cost when filtering by this field.estore_demo.bqdemo:
Refers to the input dataset (estore_demo) and table (bqdemo) from which the data is being read. Since the BQ Process is connected via a BQ Connection, there is no need to specify the project name — it is already linked through the connection.-
WHERE created_at BETWEEN '@fromDate' AND '@toDate':
Filters the records based on thecreated_atcolumn using the date range parameters@fromDateand@toDate. These parameters are automatically populated at runtime when the job is executed for a specific date range.
Example:
If the job is scheduled for the range 1st January 2025 to 3rd January 2025, then:@fromDate=2025-01-01@toDate=2025-01-03
The query will automatically retrieve and process data within this range, storing the results in a partitioned output table.
When the input source is BQ table, the query can be written as:
CREATE TABLE IF NOT EXISTS @OutputTable1(
id INTEGER,
country STRING,
city STRING,
login_date_time STRING,
created_at DATE
)
PARTITION BY created_at;
--END--
SELECT *
FROM @InputTable1
WHERE created_at BETWEEN '@fromDate' AND '@toDate';Explanation of above query:
The code remains the same, with the only difference being the addition of the parameter @InputTable1, which represents the input BigQuery table defined in the process configuration. You can view this mapping under the Input tab in the Parameters section.
Note: The BQ Process Type dropdown shown under the Parameters section of BQ Process is applicable only for creating non-partitioned output tables. It does not apply when creating partitioned outputs.
Special Note for BQ table generated from From File Process:
If the input connection is the BQ table which is generated from the From File process, then the partition column name to be used for partition will be _partitiontime. This is not the actual column name of the table but this is how currently saved in BQ metadata table for partitions. You can use below query to fetch data from partitioned BQ table generated from From file process:
SELECT
t.*,
DATE(_PARTITIONTIME) AS file_date
FROM @InputTable1 AS t
WHERE DATE(_PARTITIONTIME) BETWEEN '@fromDate' and '@toDate';In this example, DATE(_PARTITIONTIME) extracts the partition date and aliases it as file_date.
STEP 3: Execute the BQ Process
Once the BQ Process has been configured, it is ready for execution. You can either create a new job or use the Interactive Mode to run the process in a development environment for testing and validation.
In a production setup, you can also schedule the job as part of a workflow to automate data processing at defined intervals.
To learn more about running and managing jobs, please refer to the following articles:
Creating a Job
Scheduling a Job
Interactive Mode
Process Modes
Note: Add New Only and Add New and Replace Modified currently behave the same as Replace Date Range Only. Once they are supported for partitioned output, we will update it here.
Step 4: Checking the Output
After the BQ Process has executed successfully, you can verify the results by reviewing both the partition statistics and the data preview.
-
View Partition Statistics
- Click on the Output node of the BQ Process and open the State tab.
- This tab displays details of the partitions created, including the number of records in each partition.
- For example, the screenshot below shows partitions for the dates 1st January to 3rd January 2025, with each partition containing 5 records.
-
Preview Output Data
- Click on the Output node again and open the Preview tab.
- This tab displays the first 100 records from the latest partition of the output table, allowing you to quickly validate that the transformation and data loading were completed as expected.
- The example below shows a preview of the latest partition i.e. 3rd January 2025 , containing 5 records.
Question: How should I create a daily partition when the column is of type DATETIME or TIMESTAMP?
Answer: When your input table contains a DATETIME or TIMESTAMP type column and you want to use it for daily partitioning, BigQuery requires you to extract the date portion from that column.
This is because daily partitions are created on a DATE value — not on a full datetime or timestamp.
Example Query – Creating a Daily Partitioned Table With Column Type TIMESTAMP
To achieve this, simply cast or convert your datetime/timestamp column to a DATE inside the PARTITION BY clause.
CREATE TABLE IF NOT EXISTS @OutputTable1(
login_date_time TIMESTAMP,
id INTEGER,
country STRING,
city STRING,
created_at DATE
)
PARTITION BY Date(login_date_time);
--END--
SELECT
*
FROM dataset.table
WHERE Date(login_date_time) BETWEEN '@fromDate' and '@toDate';Explanation of above query:
- The input table contains a column named
login_date_time, which is of type TIMESTAMP. - We are creating a new output table that uses this same column.
- Since we want to partition daily, we apply:
PARTITION BY DATE(login_date_time)
This extracts only the date portion (e.g.,2025-01-03) from the timestamp. - The
WHEREclause also usesDATE(login_date_time)so that:- the filter aligns with the partition,
- and the query remains efficient by pruning partitions.
Creating Partitioned Output on Hourly Basis
If your input table contains a DATETIME or TIMESTAMP column and you want to create an hourly partitioned output based on that column, follow the steps below:
Step 1: Configure Partition Scheme:
Open the Output tab of the BQ Process and set the Partition Scheme of output table to Hourly.
Step 2: Add the query in the Code Tab:
Add the query to create hourly partitioned output table. Here is an example:
CREATE TABLE IF NOT EXISTS @OutputTable1(
login_date_time TIMESTAMP,
id INTEGER,
country STRING,
city STRING,
created_at DATE
)
PARTITION BY TIMESTAMP_trunc(login_date_time, hour);
--END--
SELECT
*
FROM dataset.table
WHERE login_date_time BETWEEN '@fromDate' and '@toDate';Explanation:
To create an hourly partition, the partitioning column must meet two conditions:
It must be a TIMESTAMP or DATETIME value.
It must be truncated (rounded down) to the nearest hour.
The function TIMESTAMP_TRUNC(login_date_time, HOUR)removes minutes and seconds, converting values like 2025-01-03 04:27:15 to 2025-01-03 04:00:00. This ensures all rows belonging to the same hour fall into the same partition.
If the timestamp is not truncated, BigQuery cannot determine which hourly partition the record should go into. The WHERE clause does not need any conversion because the filtering is also done on a timestamp-level, and we are partitioning hourly, not daily.
Step 3: Checking the Output Node
- Click on the Output node of the BQ Process and open the Stats tab.
- This tab displays details of the partitions created for date and its hour, including the number of records in each partition.