Home > Net >  How to flatten an Array Struct to columns in Google BigQuery
How to flatten an Array Struct to columns in Google BigQuery

Time:12-15

Consider the table sample below where data is of type array<struct<key:string,value:string>>, with the repeated keys: 'Date', 'Country' and 'Brand':

source data.key data.value
first_file Date 2022-12-14
Country Germany
Brand Mercedes
Date 2022-12-15
Country Germany
Brand BMW
second_file Date 2022-12-13
Country Sweden
Brand Volvo
Date 2022-12-10
Country France
Brand Renault

By 'repeated' keys I mean that every data.key entry always consists of those keys (Date, Country, Brand). In this example, they get repeated twice per row entry, but in the real table they might get repeated even more times per unique entry. My desired result is:

source date country brand
first_file 2022-12-14 Germany Mercedes
first_file 2022-12-15 Germany BMW
second_file 2022-12-13 Sweden Volvo
second_file 2022-12-10 France Renault

Any help on how I can reach that result?


If it helps, I've managed to turn the sample table into the format below in case you'd like to try a solution to this table instead:

source date.key date.value country.key country.value brand.key brand.value
first_file Date 2022-12-14 Country Germany Brand Mercedes
first_file Date 2022-12-15 Country Germany Brand BMW
second_file Date 2022-12-13 Country Sweden Brand Volvo
second_file Date 2022-12-10 Country France Brand Renault

Thanks!

CodePudding user response:

WITH
  tmp AS (
  SELECT
    source,
    key,
    value
  FROM
    UNNEST(ARRAY<STRUCT<source string, data ARRAY<STRUCT<key string, value string>>>>[ 
      ("first_file", [("Date","2022-12-14"), ("Country","Germany"),("Brand","Mercedes")]),
      ("second_file", [("Date","2022-12-13"), ("Country","Sweden"),("Brand","Volvo")])
      ]),
    UNNEST(data) ) -- unnest data first
SELECT
  source, date[SAFE_OFFSET(0)] as date, country[SAFE_OFFSET(0)] as country, brand[SAFE_OFFSET(0)] as brand, 
FROM
  tmp PIVOT (
    ARRAY_AGG(value IGNORE NULLS) FOR key IN ("Date", "Country", "Brand")) -- pivot table

CodePudding user response:

You can use the following query :

with sources AS 
(
  select 
    'first_file' as source,
    [
      struct('Date' as key, '2022-12-14' as value),
      struct('Country' as key, 'Germany' as value),
      struct('Brand' as key, 'Mercedes' as value)
    ] as data
  UNION ALL
  select 
    'second_file' as source,
    [
      struct('Date' as key, '2022-12-13' as value),
      struct('Country' as key, 'Sweden' as value),
      struct('Brand' as key, 'Volvo' as value)
    ] as data
)

select
  source,
  (SELECT value FROM UNNEST(data) WHERE key = 'Date') AS Date,
  (SELECT value FROM UNNEST(data) WHERE key = 'Country') AS Country,
  (SELECT value FROM UNNEST(data) WHERE key = 'Brand') AS Brand,
from sources;

You can also use an udf to centralize the logic :

CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));

with sources AS 
(
  select 
    'first_file' as source,
    [
      struct('Date' as key, '2022-12-14' as value),
      struct('Country' as key, 'Germany' as value),
      struct('Brand' as key, 'Mercedes' as value)
    ] as data
  UNION ALL
  select 
    'second_file' as source,
    [
      struct('Date' as key, '2022-12-13' as value),
      struct('Country' as key, 'Sweden' as value),
      struct('Brand' as key, 'Volvo' as value)
    ] as data
)
SELECT 
  source,
  getValue('Date', data) AS Date,
  getValue('Country', data) AS Country,
  getValue('Brand', data) AS Brand
FROM sources;

The result is :

enter image description here

  • Related