I am trying to write BigQuery SQL to apply sum() against a particular column until it reaches the threshold value, then a summarized record should be the output. The process will continue until all the records in the tables are processed and summarized, even if the sum value does not reach the threshold value.
Below is how my data will look like in the table
Expected output:
If the Threshold value is 10000000 then the SQL should give 2 rows as output for id=1. If we add the total_amount column of the first 2 records of id=1 in my raw data then it reaches the threshold value 10000000 so the output needs to be displayed as from_range as 1 and to_range as 20 and the total_amount as 10000000. Now the SQL needs to reset and do the same operation for the remaining rows of id=1 which comes to the total_amount of 210000 with from_range as 21 and to_range as 40.
But all I could achieve is to get cumulative running total_amount with the below query.
Select id, from_range, to_range, total_amount, SUM(total_amount) OVER (PARTITION BY id ORDER BY from_range asc ROWS UNBOUNDED PRECEDING) as aggregated_total_amount From Test
I've referred stack overflow questions(
CodePudding user response:
I think you can partition ranges not to be greater than threshold using an UDF like below:
CREATE TEMP FUNCTION cumsumbin(a ARRAY<INT64>) RETURNS INT64
LANGUAGE js AS """
bin = 0;
a.reduce((c, v) => {
if (c Number(v) > 10000000) { bin = 1; return Number(v); }
else return c = Number(v);
}, 0);
return bin;
""";
and then aggregate total_amount for each bin of id:
WITH binning AS (
SELECT *, cumsumbin(ARRAY_AGG(total_amount) OVER (PARTITION BY id ORDER BY from_range)) bin
FROM sample_table
)
SELECT id,
MIN(from_range) from_range,
MAX(to_range) to_range,
SUM(total_amount) total_amount
FROM binning
GROUP BY id, bin;