I have been trying to make a UDF in BigQuery to compress multiple rows into a single row
CREATE OR REPLACE FUNCTION function_name(to_compress_column INT64, order_by_column INT64) AS (
TO_JSON_STRING(
ARRAY_AGG(
IFNULL(to_compress_column,-1) RESPECT NULLS
ORDER BY order_by_column
)
)
);
But this causes the error "ARRAY_AGG not allowed in SQL function body". I tried using SELECT and UNNEST but since my input is not an ARRAY, it did not work. How should I proceed?
Edit:
My input tables have this format:
visits | place_id | date | hour |
---|---|---|---|
23 | abc123 | 2022-01-01 | 4 |
20 | abc123 | 2022-01-01 | 2 |
19 | abc123 | 2022-01-01 | 3 |
24 | abc123 | 2022-01-01 | 1 |
26 | abc123 | 2022-01-01 | 5 |
18 | abc456 | 2022-01-01 | 2 |
20 | abc456 | 2022-01-01 | 3 |
17 | abc456 | 2022-01-01 | 1 |
I want my output tables to have this format:
visits | place_id | date |
---|---|---|
[24,20,19,23,26] | abc123 | 2022-01-01 |
[17,18,20] | abc456 | 2022-01-01 |
I understand that I can do this by
SELECT TO_JSON_STRING(ARRAY_AGG(IFNULL(visits,-1) RESPECT NULLS
ORDER BY hour)) visits,
place_id,
date
from input_table
group by place_id, date
But in order to avoid repetition of the first line of the select query as there a lot of tables I have to do this in, I want to create a UDF so that the query changes to:
SELECT function_name(visits,hour) visits,
place_id,
date
from input_table
group by place_id, date
CodePudding user response:
what you want is actually called aggregate UDF and BigQuery does not support such (at least yet)! so aggregate udf direction looks like dead end here
What I see as an option is described in below example
create temp function func_name (arr any type) as ((
select to_json_string(array_agg(ifnull(visits,-1) respect nulls order by hour))
from unnest(arr)
));
select func_name(array_agg(struct(visits, hour))) visits,
place_id,
date
from input_table
group by place_id, date
if applied to sample data in your question - output is
So, it is not 100% of what you wanted , but very close in my mind
So, instead of desired line
function_name(visits,hour) visits,
you will have to use
function_name(array_agg(struct(visits, hour))) visits,
I feel like this is reasonable trade-off :o)
P.S. And, obviously, instead of temp function you will create permanent function ...