Home > Enterprise >  How do I use BigQuery DML to transform some fields of a struct nested within an array, within a stru
How do I use BigQuery DML to transform some fields of a struct nested within an array, within a stru

Time:12-22

I think this is a more complex version of the question in BigQuery table schema from GUI

Here's the data I've inserted:

INSERT INTO `my_dataset.test_data_for_so` (date, hits)
VALUES (
  CAST('2021-01-01' AS date),
  [
    STRUCT(
      STRUCT<query STRING, other_column STRING>('foo bar', 'foo bar'),
      [
        STRUCT<key STRING, value STRING>('foo bar', 'foo bar')
      ]
    )
  ]
)

My goal is to transform the "search.query" and "metadata.value" fields. For example, uppercasing them, leaving every other column (and every other struct field) in the row unchanged.

I'm looking for a solution involving either manually specifying each column in the SQL, or preferably, one where I can only mention the columns/fields I want to transform in the SQL, omitting all other columns/fields. This is a minimal example. The table I'm working on in production has hundreds of columns and fields.

For example, that row, when transformed this way, would change from:

[
  {
    "date": "2021-01-01",
    "hits": [
      {
        "search": {
          "query": "foo bar",
          "other_column": "foo bar"
        },
        "metadata": [
          {
            "key": "foo bar",
            "value": "foo bar"
          }
        ]
      }
    ]
  }
]

to:

[
  {
    "date": "2021-01-01",
    "hits": [
      {
        "search": {
          "query": "FOO BAR",
          "other_column": "foo bar"
        },
        "metadata": [
          {
            "key": "foo bar",
            "value": "FOO BAR"
          }
        ]
      }
    ]
  }
]

CodePudding user response:

preferably, one where I can only mention the columns/fields I want to transform in the SQL ...

Use below approach - it does exactly what you wish - ONLY those fields that are to be updated are in use, all other (tens or hundreds ...) are preserved as is

update your_table
set hits = array(
  select as struct *
    replace(
      (select as struct * replace (upper(query) as query) from unnest([search])) as search, 
      array(select as struct * replace(upper(value) as value) from unnest(metadata)) as metadata
    )
  from unnest(hits)
)
where true;            

if applied to sample data in your question - result is

enter image description here

  • Related