I have a process that automatically ingests CSV data into Snowflake Tables from an external stage. CSV File that has the delimiter as pipe (vertical bar or | )
But one cell has data that contains the Escape character itself (which is backslash - "") and immediately followed by the Column Delimiter (which is |)
Now the ingestion process is assuming that the delimiter ("|") is part of the data itself because it is immediately followed by the "" and due to this, column values are shifting left causing erratic data in the table.
The data in CSV looks like this
Column_1|Column_2|Column_3
ABC ......... |DEF..........\ |EFG
When this gets ingested into the table it looks like this
Column_1 Column_2 Column_3
ABC.......... DEF......|EFG NULL
Ideally, it should have got ingested as
Column_1 Column_2 Column_3
ABC.......... DEF.........\ EFG
Note: I can not alter the data as it has to flow from upstream systems and I need to ingest the data as is (even though it looks like a genuine data issue)
Could someone help me here as to which property of the File Format should I use in order for ingesting this data as is?
Thank you in advance!
CodePudding user response:
Setting the ESCAPE_UNENCLOSED_FIELD = NONE
for the file_format configuration might help to achieve the desired result. The entire file format should like:
create or replace file format my_csv_format
type = csv
field_delimiter = '|'
skip_header = 1
null_if = ('NULL', 'null')
ESCAPE_UNENCLOSED_FIELD = NONE
empty_field_as_null = true
ERROR_ON_COLUMN_COUNT_MISMATCH = false
compression = auto;