Home > database >  Bigquery - Handle Double Quotes & Pipe Field Separator in CSV (Federated Table)
Bigquery - Handle Double Quotes & Pipe Field Separator in CSV (Federated Table)

Time:09-27

I am currently facing issues loading data into big query or even creating federated table as the incoming data is delimited by | pipe symbol with escaped double quotes on fields inside the file

Sample Data (also tried escaping double quote values with double double-quotes i.e "" on field level)

13|2|"\"Jeeps, Trucks & Off-Roa"|"JEEPSTRU"

Create DDL

CREATE OR REPLACE EXTERNAL TABLE `<project>.<dataset>.<table>`
WITH PARTITION COLUMNS (
    dt DATE
    )
OPTIONS (
     allow_jagged_rows=true,
     allow_quoted_newlines=true,
     format="csv",
     skip_leading_rows=1,
     field_delimiter="|",
     uris=["gs://path/to/bucket/table/*"],
     hive_partition_uri_prefix='gs://path/to/bucket/table'
)

Query

SELECT
  *
FROM
  `<project>.<dataset>.<table>`
WHERE field_ like '%Jeep%'

Error

Error while reading table: <project>.<dataset>.<table>, error message: Error detected while parsing row starting at position: 70908. Error: Data between close double quote (") and field separator.

However, it works if I create the table with the option quote empty character quote="" which makes hard to filter out on SQL query

enter image description here

I need the field_ data to be loaded as "Jeeps, Trucks & Off-Roa

I tried to find various documentation & StackOverflow question (since everything is old or not working - or unlucky me) I am posting this question again.

I have a very basic question > What is the better way to escape double quotes in a column for federated big query table to avoid this problem without preprocessing csv/psv raw data?

CodePudding user response:

This is not problem with external table or bigquery, but rather CSV files feature. I had similar once when I uploaded data to table in UI. I have found some sources(BTW which I cannot find right now) that double quotes should be used twice ("") in CSV file to get such behavior, like using your exaple:

13|2|"""Jeeps, Trucks & Off-Roa"|"JEEPSTRU"

I have tested it in your sample. When I downloaded data to table from csv I got the same error. And after using above it worked as expected. Result field value is:

"Jeeps, Trucks & Off-Roa

I suppose it will work for you as well.

EDIT: I have found it in Basic Rules of CSV on Wikipedia:

  • Each of the embedded double-quote characters must be represented by a pair of double-quote characters.

1997,Ford,E350,"Super, ""luxurious"" truck"

  • Related