Home > Software engineering >  Create batches of IDs in BigQuery with fixed number of ids in each batch and sorted order
Create batches of IDs in BigQuery with fixed number of ids in each batch and sorted order

Time:11-30

I want to create a string aggregation of ids in batch groups from a table with 2 columns.

The source table has 2 columns with around 100k rows and I want to create 100 batches with about 1000 comma separated ids in each batch. They should be sorted in order of descending score.

Start table:

ID Score
22334 725543
15432 65293
23456 17354
91274 2263

This is what I want to create:

batch_grp id_str (1000ids on each row. score is just for sorting and is left out
b00001 11223,33443,445566,22333,56566...
b00002 65293,33343,4674335,67656,8877887,55666...
b00003 17354,54,7777,55,333235,6656665...

This is what I have which does create batches but just not equal size.

WITH x as( SELECT ID, RANK() OVER(ORDER by SCORE DESC) as rank FROM TABLESCORES )

SELECT REPLACE(LPAD(CAST(CAST(CEILING(rank / 100) as int64) as string),6),' ','0') as BATCH_GRP,

STRING_AGG(ID) as ID_STR FROM x

GROUP BY BATCH_GRP

CodePudding user response:

Try below approach

select div(pos - 1, 1000)   1 as batch_grp, 
  string_agg('' || ID, ',' order by Score) as id_str
from (
  select *, row_number() over() pos
  from your_table
)
group by batch_grp
  • Related