I am currently facing issues loading data into big query or even creating federated table as the incoming data is delimited by |
pipe symbol with escaped double quotes on fields inside the file
Sample Data (also tried escaping double quote values with double double-quotes i.e ""
on field level)
13|2|"\"Jeeps, Trucks & Off-Roa"|"JEEPSTRU"
Create DDL
CREATE OR REPLACE EXTERNAL TABLE `<project>.<dataset>.<table>`
WITH PARTITION COLUMNS (
dt DATE
)
OPTIONS (
allow_jagged_rows=true,
allow_quoted_newlines=true,
format="csv",
skip_leading_rows=1,
field_delimiter="|",
uris=["gs://path/to/bucket/table/*"],
hive_partition_uri_prefix='gs://path/to/bucket/table'
)
Query
SELECT
*
FROM
`<project>.<dataset>.<table>`
WHERE field_ like '%Jeep%'
Error
Error while reading table: <project>.<dataset>.<table>, error message: Error detected while parsing row starting at position: 70908. Error: Data between close double quote (") and field separator.
However, it works if I create the table with the option quote empty character quote=""
which makes hard to filter out on SQL query
I need the field_
data to be loaded as "Jeeps, Trucks & Off-Roa
I tried to find various documentation & StackOverflow question (since everything is old or not working - or unlucky me) I am posting this question again.
I have a very basic question > What is the better way to escape double quotes in a column for federated big query table to avoid this problem without preprocessing csv/psv raw data?
CodePudding user response:
This is not problem with external table or bigquery, but rather CSV files feature. I had similar once when I uploaded data to table in UI. I have found some sources(BTW which I cannot find right now) that double quotes should be used twice (""
) in CSV file to get such behavior, like using your exaple:
13|2|"""Jeeps, Trucks & Off-Roa"|"JEEPSTRU"
I have tested it in your sample. When I downloaded data to table from csv I got the same error. And after using above it worked as expected. Result field value is:
"Jeeps, Trucks & Off-Roa
I suppose it will work for you as well.
EDIT: I have found it in Basic Rules of CSV on Wikipedia:
- Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super, ""luxurious"" truck"