Home > Blockchain >  BigQuery - loading CSV with newlines fail with Node.js, but works with gsutil
BigQuery - loading CSV with newlines fail with Node.js, but works with gsutil

Time:05-16

I'm trying to load CSV from Cloud Storage to Big Query with a Cloud Function. The file has newlines in some of the strings.

When I load from Cloud Shell it loads perfectly and the strings are found in the table with the newlines.

bq --location=EU load \
--replace \
--allow_quoted_newlines \
--skip_leading_rows=1 \
--source_format=CSV \
my_gcp_project:my_dataset.my_table \
gs://9604/data/data.csv \
my_schema.json

In my Cloud Function I get the error,

Error while reading data, error message: Error detected while parsing row starting at position: 2624400. Error: Missing close double quote (") character.

using Cloud Function with Node.js 10 and settings that to me are the same;

const datasetId = 'my_dataset';
    const tableId = 'my_table';
    const metadata = {
      sourceFormat: 'CSV',
      skipLeadingRows: 1,
      allowQuotedNewlines: True,
      schema: {
        fields: [{"name": "gdb", "type": "STRING"},
  {"name": "venueid", "type": "STRING"},
  {"mode": "NULLABLE", "name": "buildingid", "type": "STRING"},
  {"mode": "NULLABLE", "name": "floorexists", "type": "STRING"},
  {"mode": "NULLABLE", "name": "roomid", "type": "STRING"},
  {"mode": "NULLABLE", "name": "displayname", "type": "STRING"},
  {"mode": "NULLABLE", "name": "zlevel", "type": "INTEGER"},
  {"mode": "NULLABLE", "name": "class", "type": "STRING"},
  {"mode": "NULLABLE", "name": "accesslevel", "type": "STRING"},
  {"mode": "NULLABLE", "name": "created_date", "type": "DATETIME"},
  {"mode": "NULLABLE", "name": "year", "type": "STRING"},
  {"mode": "NULLABLE", "name": "quarter", "type": "STRING"},
  {"mode": "NULLABLE", "name": "yearquarter", "type": "STRING"},
  {"mode": "NULLABLE", "name": "month", "type": "STRING"},
  {"mode": "NULLABLE", "name": "area_sqm", "type": "FLOAT"},
  {"mode": "NULLABLE", "name": "area_sqft", "type": "FLOAT"}],
      },
      // Set the write disposition to overwrite existing table data.
      writeDisposition: 'WRITE_TRUNCATE',
      location: 'EU',
    };

What am i missing? Thanks for your input!

CodePudding user response:

Found the issue. For schema I had:

 {'name': 'gdb', 'type' : 'STRING'},
 {'name': 'venueid', 'type' : 'STRING'},

Removing the '' fixed the issue. So, with GSUTIL it was fine, in JS not so much. Working schema:

{name: 'gdb', type: 'STRING'},
{name: 'venueid', type: 'STRING'}
  • Related