The BigQuery (BQ) Process is a Synthesizer, Composer, and Orchestrator process that builds and manages user-defined analytics datasets using pre-written and verified working code. It is designed to allow the user to query an existing BigQuery table to create a new dataset within the Syntasa environment.
To utilize the following information is required:
- Connect to an existing Big Query connection
- Provide input query and output configuration
Process Configuration
The BQ Process has very few parameters that need to be set, the key is to ensure the code is verified to be working through a query editor before attempting to deploy.
- Drag the BQ Process on to the canvas
- Drag and connect a dataset to the BQ Process (this is required before editing of the process can occur)
Parameters
This section provides the information Syntasa needs to configure the BQ Process.
Click on the BQ Process node to access the editor.
BQ Query Process Type
Use the drop-down to select an available process type.
- Append
- Partitioned - appends records in a specific partition(s)
- Non partitioned - appends records
- Overwrite
- Partitioned - historical data is updated with new partitions
- Non partitioned - historical data is wiped off and new data is loaded.
File Upload
- Click on the green paperclip icon to browse to the desired file to import
- User can upload .sql file instead of typing it in Query UI text box
Legacy SQL button
- Yes - Query will be run using Legacy SQL
- No - Query will be run using Standard SQL
Query UI
User can write their own SQL queries in this text box
- DDL support
- Create non partitioned table
- Create partitioned table
- DML support
- industry standard DML
- Parameters / features
- '@fromDate' - allows to run the query by choosing the dates dynamically from job configuration UI instead of hard-coding
- '@toDate' - allows to run the query by choosing the dates dynamically from job configuration UI instead of hard-coding
- '@partitionDate' - allows to run the query by choosing the partition dates dynamically from job configuration UI instead of hard-coding
- '@database'
- Allows database to be updated automatically from development and production environment
- Database details are retrieved from event store configuration
- Use
--END--
to mark end of query if there are multiple queries in a single BigQuery process.- Example Query:
- Note that there are 2 queries in 1 query box
- The queries leverage use of '@database' and '@partitionDate' features
- Example Query:
CREATE TABLE if not exists `syntasa-402-qa.@database.sample_event` (
session_date DATE,
customer_code STRING,
country STRING
)
PARTITION BY session_date;
--END--
SELECT
CAST(substr(date_time , 1 , 10) as DATE) session_date,
customer_code,
country
FROM `syntasa-402-qa.@database.tb_event`
WHERE _PARTITIONTIME = TIMESTAMP("@partitionDate");
--END--
Output
The Output tab provides the ability to name the table and displayed name on the graph canvas, along with selecting whether to load to Big Query (BQ) if in the the Google Cloud Platform (GCP), load to Redshift or RDS if in Amazon Web Services (AWS), or simply write to HDFS if an using on-premise Hadoop.
Dataset
- Table Name - defines the name of the database table where the output data will be written. Please ensure that the table name is unique to all other tables within the defined Event Store, otherwise, data previously written by another process will get overwritten.
- Display Name - label of the process output icon displayed on the app graph canvas.
- Compression - option to compress the files, reducing amount of storage required. Compression adds a bit of overhead to processing, but if raw files will be stored indefinitely, it is recommended to compress the files. If raw files will be removed after Event Enrich processing, then it is recommended to turn Compression off.
- Event Store Name - name of the Event Store selected when initially creating the app. This option is not configurable, if any of the Event Store Name, Database or Location details are incorrect then back out of the app and make the changes in the Event Stores settings screen.
- Database - name of the database in the event store that data will get written.
- Location - storage bucket or HDFS location where source raw files will be stored for Syntasa Event Enrich process to use.
Expected Output
The expected output of the BQ Process is either a new table or update of an existing table from the results of the query provided in the editor window. This table can be queried directly using an enterprise provided query engine. Additionally, the table can serve as the foundation for building other datasets.