Consider the table sample below where data
is of type array<struct<key:string,value:string>>
, with the repeated keys
: 'Date', 'Country' and 'Brand':
source | data.key | data.value |
---|---|---|
first_file | Date | 2022-12-14 |
Country | Germany | |
Brand | Mercedes | |
Date | 2022-12-15 | |
Country | Germany | |
Brand | BMW | |
second_file | Date | 2022-12-13 |
Country | Sweden | |
Brand | Volvo | |
Date | 2022-12-10 | |
Country | France | |
Brand | Renault |
By 'repeated' keys I mean that every data.key entry always consists of those keys (Date, Country, Brand). In this example, they get repeated twice per row entry, but in the real table they might get repeated even more times per unique entry. My desired result is:
source | date | country | brand |
---|---|---|---|
first_file | 2022-12-14 | Germany | Mercedes |
first_file | 2022-12-15 | Germany | BMW |
second_file | 2022-12-13 | Sweden | Volvo |
second_file | 2022-12-10 | France | Renault |
Any help on how I can reach that result?
If it helps, I've managed to turn the sample table into the format below in case you'd like to try a solution to this table instead:
source | date.key | date.value | country.key | country.value | brand.key | brand.value |
---|---|---|---|---|---|---|
first_file | Date | 2022-12-14 | Country | Germany | Brand | Mercedes |
first_file | Date | 2022-12-15 | Country | Germany | Brand | BMW |
second_file | Date | 2022-12-13 | Country | Sweden | Brand | Volvo |
second_file | Date | 2022-12-10 | Country | France | Brand | Renault |
Thanks!
CodePudding user response:
WITH
tmp AS (
SELECT
source,
key,
value
FROM
UNNEST(ARRAY<STRUCT<source string, data ARRAY<STRUCT<key string, value string>>>>[
("first_file", [("Date","2022-12-14"), ("Country","Germany"),("Brand","Mercedes")]),
("second_file", [("Date","2022-12-13"), ("Country","Sweden"),("Brand","Volvo")])
]),
UNNEST(data) ) -- unnest data first
SELECT
source, date[SAFE_OFFSET(0)] as date, country[SAFE_OFFSET(0)] as country, brand[SAFE_OFFSET(0)] as brand,
FROM
tmp PIVOT (
ARRAY_AGG(value IGNORE NULLS) FOR key IN ("Date", "Country", "Brand")) -- pivot table
CodePudding user response:
You can use the following query :
with sources AS
(
select
'first_file' as source,
[
struct('Date' as key, '2022-12-14' as value),
struct('Country' as key, 'Germany' as value),
struct('Brand' as key, 'Mercedes' as value)
] as data
UNION ALL
select
'second_file' as source,
[
struct('Date' as key, '2022-12-13' as value),
struct('Country' as key, 'Sweden' as value),
struct('Brand' as key, 'Volvo' as value)
] as data
)
select
source,
(SELECT value FROM UNNEST(data) WHERE key = 'Date') AS Date,
(SELECT value FROM UNNEST(data) WHERE key = 'Country') AS Country,
(SELECT value FROM UNNEST(data) WHERE key = 'Brand') AS Brand,
from sources;
You can also use an udf
to centralize the logic :
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
with sources AS
(
select
'first_file' as source,
[
struct('Date' as key, '2022-12-14' as value),
struct('Country' as key, 'Germany' as value),
struct('Brand' as key, 'Mercedes' as value)
] as data
UNION ALL
select
'second_file' as source,
[
struct('Date' as key, '2022-12-13' as value),
struct('Country' as key, 'Sweden' as value),
struct('Brand' as key, 'Volvo' as value)
] as data
)
SELECT
source,
getValue('Date', data) AS Date,
getValue('Country', data) AS Country,
getValue('Brand', data) AS Brand
FROM sources;
The result is :