I have a records
table:
CREATE TABLE `records`
(
`ID1` INT64,
`ID2` DATE,
`EVENT` INT64,
);
This table has some records like:
INSERT `records` (ID1, ID2, EVENT)
VALUES (1, PARSE_DATE('%Y-%m-%d', '2022-01-10'), 111),
(1, PARSE_DATE('%Y-%m-%d', '2022-01-10'), 111),
(1, PARSE_DATE('%Y-%m-%d', '2022-01-10'), 222),
(1, PARSE_DATE('%Y-%m-%d', '2022-03-15'), 333),
(1, PARSE_DATE('%Y-%m-%d', '2022-03-15'), 111),
(1, PARSE_DATE('%Y-%m-%d', '2022-07-22'), 444),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 999),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 888),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 888),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 555),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 666),
(2, PARSE_DATE('%Y-%m-%d', '2022-07-22'), 111)
And I have a separate dictionary (or vocabulary) table:
CREATE TABLE `t_vocabs`
(
`EVENT` INT64,
`DESCRIPTION` STRING,
);
Table t_vocabs
contains all unique values for EVENT
:
INSERT `t_vocabs` (EVENT, DESCRIPTION)
VALUES (111, 'desc1'),
(222, 'desc2'),
(333, 'desc3'),
(444, 'desc4'),
(555, 'desc5'),
(666, 'desc6'),
(777, 'desc7'),
(888, 'desc8'),
(999, 'desc9'),
(123, 'desc10'),
(456, 'desc111')
Therefore, records
would look like this:
And the vocabulary table would look like this:
I need to convert records
to multi-hot encoding in which for each ID1
and ID2
I have one record, with the number of columns equal to the unique number of events in t_vocabs
and the values equal to 1 when that relevant event exists in records
and 0 otherwise.
For this example, it'd be like:
Note, I have tried
SELECT
*
FROM
`records`
PIVOT
(COUNT(*) FOR EVENT IN (111, 222, 333, 444, 555, 666, 777, 888, 999, 123, 456))
And this does the job, but the issue is that the t_vocabs
is a big table and I can't list all EVENT
values manually in this query. When I try to select events from another table (e.g. following query) it doesn't like the select:
SELECT *
FROM `records`
PIVOT(COUNT(*) FOR EVENT IN (SELECT EVENT FROM `t_vocabs`))
CodePudding user response:
Use below
execute immediate (select '''
select * from records
pivot (count(*) for event in (''' || string_agg('' || event, ',' order by event) || "))"
from t_vocabs
)
if applied to sample data in your question - output is