This guide serves as a table of contents for getting started with the From File process in Syntasa. Each section provides an in-depth look at the different aspects of using From File to ingest and process external data. Use the links below to navigate to the relevant topics for a detailed explanation and practical use cases.
-
Understanding From File in General
This article provides an overview of the From File process, its purpose, and its role in the Syntasa workflow. It is ideal for those new to data ingestion and need to understand the fundamental concepts behind ingesting data from external files. -
File Types Supported
This article explains the types of files supported by the From File process, including delimited files (CSV, TSV), Apache logs, and Zip files. It details how to select the correct file type based on the nature of your data.
-
Configuring Input Screen for File Ingestion
In this article, you will learn how to configure the Input screen for file ingestion in Syntasa, including setting source paths, file patterns, compression types, and enabling features like Auto Configure, validation, and incremental loading. Ensure precise event data handling with customizable formats and schema detection.
-
Ingesting Delimited Files
Here, we focus on ingesting structured data in delimited formats, such as CSV or TSV. It covers configuring delimiters, quote characters, and escape characters to correctly parse data.
-
Ingesting Zip or Tar Files
Learn how to configure the From File process to ingest compressed Zip files. This guide explains the steps to unzip and process files directly within Syntasa.
-
Incremental Load
This section explains the concept of incremental loading and how to configure the From File process to load new or updated data without reprocessing the entire dataset. This feature essential for applications that need to process only the most recent data, improving efficiency and reducing processing time.
-
Date Manipulation
Learn how to configure date offsets when files contain data for a previous or future date, ensuring data is stored under the correct date in the output table. This feature is useful when data files are named with the current date but contain data for a prior or future day.
-
Creating Schema for Output Table
This article provides guidance on creating the schema for the output table that will store the processed data. It explains how to map fields from the ingested file to a structured table.
-
Configuring Output Table
Learn how to configure the output table, including naming conventions, storage locations, and data destinations like cloud services (e.g., BigQuery, Redshift) or on-premise systems. This section is essential for configuring where and how processed data should be stored for use in downstream applications or reporting systems.
- Ingesting Apache Log Files
This article explains how to ingest and extract data from Apache log files, including Common, Combined, and Custom log formats. It highlights the role of regex patterns in extracting relevant fields. You can refer this article to configure From File for parsing Apache log files to extract server request data.