Home > Software engineering >  BigQuery | bq load with inline schema with STRING REPEATED
BigQuery | bq load with inline schema with STRING REPEATED

Time:11-03

I am trying to load a bq table with the below definition and one of the column (ref_list) is of STRING REPEATED.

[
  {
    "name": "emp",
    "type": "STRING"
  },
  {
    "mode": "REPEATED",
    "name": "ref_list",
    "type": "STRING"
  },
  {
    "name": "update_date",
    "type": "DATE"
  }
]

Below is how my input data is:

{"emp":"Adam","ref_list":["Roger","Calvin","Andrew","Kohl"],"update_date":"1999-01-01"}
{"emp":"AntiP27","ref_list":["John","Patrick","Nick","Chris"],"update_date":"2020-01-01"}

I am able to load the table by point the .schema file from my local but the same is failing when I provide the in-line schema.

Here is my bq load command with inline schema option. I am not quite sure how I could specify the mode = REPEATED

bq load --replace --source_format=NEWLINE_DELIMITED_JSON emp_stage.emp_dtl gs://1324-global-delivery/emp_dtl.json emp:STRING,ref_list:STRING,update_date:DATE 

CodePudding user response:

According to the documentation, it's not possible to specify a RECORD and the columns mode (NULLABLE, REPEATED), with an inline schema :

When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE. To include descriptions, modes, and RECORD types, supply a JSON schema file instead.

bq_manually_specifying_schemas

If you need to use these parameters, you have to specify them in a Json schema in a dedicated file, as you used in your example.

  • Related