The From File process in Syntasa allows users to ingest structured data files, typically in delimited formats such as CSV or TSV. To successfully configure the ingestion, users need to define the delimiter, quote character, and escape character. This guide explains how to set up these parameters correctly for seamless file processing.
Configuring Delimited File Ingestion
When ingesting a delimited file, the key configurations are:
- File type - Set to 'TextFile' to indicate that the file uses a textfile format which can be csv, tsv or log file.
- Event Data Format – Set to "Delimited" to indicate that the file uses a specific character to separate fields.
- Field Delimiter – Specifies which character separates the fields in the file. Common delimiters include
,
(comma),|
(pipe),\t
(tab), and;
(semicolon). - Quote Character – Used to enclose values, especially those containing delimiters.
- Escape Character – Handles special characters within quoted fields.
Understanding Field Delimiter
Different delimiters are used depending on the file format. Few delimiters require escaping to avoid misinterpretation by the system. Below are the common delimiters and how they are configured in the field Field Delimiter:
Delimiter | Field Delimiter |
Comma (,) | , |
Tab(\t) | \t |
Pipe (|) | \| |
Caret(^) | \^ |
Double Quotes (") | " |
Single Quotes (') | ' |
Backslash (\) | \\ |
In the below example, the comma ,
is separating the fields. So, the Field delimiter is ,
ProductID,ProductName,Price
101,Mobile,300
102,Laptop,1000
In the example below, the pipe (|
) is used as the field separator, making it the delimiter. However, since the pipe character requires escaping for proper parsing, it must be entered as \|
in the Field Delimiter field.
ID|Name|City
1|John Doe|New York
2|Jane Smith|Los Angeles
Understanding Quote Character
The quote character is used to enclose values that contain the field delimiter inside them.
- This helps avoid incorrect parsing when a delimiter appears inside a field value.
- In below example, the delimiter is , (comma) and the quote character is set to " (double quotes).
ProductID,ProductName,Price
101,"Mobile, Samsung",300
102,"Laptop, Dell",1000 -
Here,
"Mobile, Samsung"
is a single field despite having a comma inside. -
Without the quote character, the system would misinterpret this as two separate fields.
Understanding Escape Character
The escape character is used to handle special characters inside quoted text, such as quotes themselves.
- In the below example, the delimiter is , (comma) and the quote character is set to " (double quotes) and the escape character is set to \\ (double backslash).
ProductID,Description
101,"Best \"Samsung\" mobile"
102,"Laptop \\ high performance" -
Here,
\"Samsung\"
ensures that quotes inside the text are treated as part of the value instead of closing the quoted field. -
The
\\
beforehigh performance
allows the backslash itself to be stored correctly in the data.
Proper configuration of delimiters, quote characters, and escape characters ensures that data is correctly parsed during ingestion. Syntasa provides flexibility by allowing users to define these parameters based on their file structure, ensuring smooth data processing.