From DB - Overview – SYNTASA™

The From DB process in Syntasa is a powerful in-built process that allows users to extract data directly from external databases into the Syntasa environment. This enables seamless data ingestion from sources like PostgreSQL, SQL Server, Amazon Athena, and other supported databases.

Once data is pulled into Syntasa, it becomes available for further transformation, analysis, or enrichment using downstream processes like Transform, Spark processor, etc.

The extraction pipeline involves:

Setting up a secure connection to the source database.
Using the From DB process to query and fetch data.
Configuring storage options, including partitioning and file formats.

Features of Using the 'From DB' Process

Database Agnostic: Works with multiple relational databases such as PostgreSQL, SQL Server, Athena, MySQL, and more.
Easy Connectivity: Drag-and-drop interface to link database connections within the app workflow.
Partitioned and Non-Partitioned Storage: Supports both bulk data loads and fine-grained partitioned storage based on selected columns.
Hourly and Daily Partitioning: Choose the output to partition by date or by date and hour for high-granularity datasets.
Multiple Output Formats: Store extracted data in Parquet, ORC, Avro, Delta, or Textfile formats to suit performance and compatibility needs.
Reusability: Once data is ingested, it can be reused across workflows for different transformation logic and client-specific use cases.
Cloud Native: Extracted data is stored securely in cloud storage environments, making it accessible for Syntasa-native processing.

Supported Databases

The 'From DB' process supports a wide range of relational and cloud-based databases, including:

Athena
DocumentDB
Elasticsearch
MongoDB
MySQL
Oracle
Postgres
Redshift
Snowflake
SQL Server
Teradata
Other JDBC-supported databases

Use Cases For Using the 'From DB' Process

Here are some typical scenarios where the From DB process is highly effective:

Initial Data Ingestion from Legacy Systems
Organizations with existing on-premise or cloud-hosted databases can use From DB to perform initial bulk loads of data into Syntasa, enabling centralized processing and analytics.
Incremental Data Loads Based on Partition Columns
Set up partitioned extraction (e.g., using a date column) to load only new or updated data. This is ideal for maintaining up-to-date datasets without reloading everything each time.
Hourly Streaming-Like Data Feeds
When working with real-time or near-real-time data, enable hourly partitioning to store fine-grained data segments for more timely processing and analytics.
Supporting Data Science and Machine Learning
Data extracted via From DB can feed machine learning pipelines by being used as input to models running in Syntasa or external tools.
Data Archiving and Lake Storage
Organizations can use the From DB process to regularly archive database tables into cost-effective cloud storage in formats like Parquet or ORC.

{[{category.name}]}

Features of Using the 'From DB' Process

Supported Databases

Use Cases For Using the 'From DB' Process