Home > Blockchain >  Aggregate function ARRAY_AGG not allowed in SQL function
Aggregate function ARRAY_AGG not allowed in SQL function

Time:09-04

I have been trying to make a UDF in BigQuery to compress multiple rows into a single row

CREATE OR REPLACE FUNCTION function_name(to_compress_column INT64, order_by_column INT64) AS (
    TO_JSON_STRING(
           ARRAY_AGG(
                IFNULL(to_compress_column,-1) RESPECT NULLS
                ORDER BY order_by_column
                    )
                  )
    );

But this causes the error "ARRAY_AGG not allowed in SQL function body". I tried using SELECT and UNNEST but since my input is not an ARRAY, it did not work. How should I proceed?

Edit:

My input tables have this format:

visits place_id date hour
23 abc123 2022-01-01 4
20 abc123 2022-01-01 2
19 abc123 2022-01-01 3
24 abc123 2022-01-01 1
26 abc123 2022-01-01 5
18 abc456 2022-01-01 2
20 abc456 2022-01-01 3
17 abc456 2022-01-01 1

I want my output tables to have this format:

visits place_id date
[24,20,19,23,26] abc123 2022-01-01
[17,18,20] abc456 2022-01-01

I understand that I can do this by

SELECT TO_JSON_STRING(ARRAY_AGG(IFNULL(visits,-1) RESPECT NULLS
  ORDER BY hour)) visits, 
  place_id, 
  date 
  from input_table
  group by place_id, date

But in order to avoid repetition of the first line of the select query as there a lot of tables I have to do this in, I want to create a UDF so that the query changes to:

SELECT function_name(visits,hour) visits, 
      place_id, 
      date 
      from input_table
      group by place_id, date

CodePudding user response:

what you want is actually called aggregate UDF and BigQuery does not support such (at least yet)! so aggregate udf direction looks like dead end here

What I see as an option is described in below example

create temp function func_name (arr any type) as (( 
  select to_json_string(array_agg(ifnull(visits,-1) respect nulls order by hour))
  from unnest(arr) 
));
select func_name(array_agg(struct(visits, hour))) visits, 
  place_id, 
  date 
  from input_table
  group by place_id, date      

if applied to sample data in your question - output is

enter image description here

So, it is not 100% of what you wanted , but very close in my mind
So, instead of desired line

function_name(visits,hour) visits,    

you will have to use

function_name(array_agg(struct(visits, hour))) visits,

I feel like this is reasonable trade-off :o)

P.S. And, obviously, instead of temp function you will create permanent function ...

  • Related