The From DB process in Syntasa is a powerful in-built application component that allows users to extract data directly from external databases into the Syntasa environment. This enables seamless data ingestion from sources like PostgreSQL, SQL Server, Amazon Athena, and other supported databases.
Once data is pulled into Syntasa, it becomes available for further transformation, analysis, or enrichment using downstream processes like Transform, Spark processor etc.
The extraction pipeline involves:
- Setting up a secure connection to the source database.
- Using the From DB process to query and fetch data.
- Configuring storage options, including partitioning and file formats.
Features of Using 'From DB' Process
- Database Agnostic: Works with multiple relational databases such as PostgreSQL, SQL Server, Athena, MySQL, and more.
- Easy Connectivity: Drag-and-drop interface to link database connections within the app workflow.
- Partitioned and Non-Partitioned Storage: Supports both bulk data loads and fine-grained partitioned storage based on selected columns.
- Hourly and Daily Partitioning: Choose the output to partition by date or by date and hour for high-granularity datasets.
- Multiple Output Formats: Store extracted data in Parquet, ORC, Avro, Delta or Textfile formats to suit performance and compatibility needs.
- Reusability: Once data is ingested, it can be reused across workflows for different transformation logic and client-specific use cases.
- Cloud Native: Extracted data is stored securely in cloud storage environments, making it accessible for Syntasa-native processing.
Supported Databases
The 'From DB' process supports a wide range of relational and cloud-based databases, including:
- Athena
- DocumentDB
- Elasticsearch
- MongoDB
- MySQL
- Oracle
- Postgres
- Redshift
- Snowflake
- SQL Server
- Teradata
- Other JDBC-supported databases
Use Cases For Using 'From DB' Process
Here are some typical scenarios where the From DB process is highly effective:
- Initial Data Ingestion from Legacy Systems
Organizations with existing on-premise or cloud-hosted databases can use From DB to perform initial bulk loads of data into Syntasa, enabling centralized processing and analytics. - Incremental Data Loads Based on Partition Columns
Set up partitioned extraction (e.g., using a date column) to load only new or updated data. This is ideal for maintaining up-to-date datasets without reloading everything each time. - Hourly Streaming-Like Data Feeds
When working with real-time or near real-time data, enable hourly partitioning to store fine-grained data segments for more timely processing and analytics. - Supporting Data Science and Machine Learning
Data extracted via From DB can feed machine learning pipelines by being used as input to models running in Syntasa or external tools. - Data Archiving and Lake Storage
Organizations can use the From DB process to regularly archive database tables into cost-effective cloud storage in formats like Parquet or ORC.