In a BigQuery table, I need to refer to the previous value from the same column to calculate the new value for that column like this:
Timestamp | Liters per Minute | Liters remaining |
---|---|---|
2022-08-15 08:50 UTC | 5 | 1000 |
2022-08-15 08:51 UTC | 4 | 1000 - (avg(5, 4) = 4.5) = 995.5 |
2022-08-15 08:51 UTC | 6 | 995.5 - (avg(4, 6) = 5) = 990.5 |
2022-08-15 08:51 UTC | 6 | 990.5 - (avg(6, 6) = 6) = 984.5 |
Sometimes the 'Liters remaining' will come from outside, so I actually have used coalesce to choose either this external actual value if it exists, or the previous value in time for this column if it doesn't:
liters remaining estimated = coalesce(Liters remaining actual, lag(liters remaining estimated) - liters per minute)
but I can't refer to 'self' (liters remaining estimated) in the calculation (I think!)
Timestamp | Liters per Minute | Liters remaining actual | Liters remaining estimated |
---|---|---|---|
2022-08-15 08:50 UTC | 5 | 1000 | 1000 |
2022-08-15 08:51 UTC | 4 | 1000 - (avg(5, 4) = 4.5) = 995.5 | |
2022-08-15 08:51 UTC | 6 | 995.5 - (avg(4, 6) = 5) = 990.5 | |
2022-08-15 08:51 UTC | 6 | 990.5 - (avg(6, 6) = 6) = 984.5 | |
2022-08-15 08:51 UTC | 6 | 986 | 986 |
2022-08-15 08:51 UTC | 3 | 986 - (avg(6, 3) = 4.5) 981.5 |
I'm not sure how to split this up or if there is some magic bq sql I'm about to learn :)
CodePudding user response:
Consier below query. I've slightly modified your liters remaining logic like below and applied kind of cumulative sum analytic function to generate expected output.
- 1000 - SUM(-5, 5) /2 = 1000
- 1000 - SUM(5, 4) / 2 = 995.5
- 1000 - SUM(5, 4, 4, 6) / 2 = 990.5
- 1000 - SUM(5, 4, 4, 6, 6, 6) / 2 = 984.5
actual_liters column is used to partition a table whenever new actual liters is given.
WITH partitioned AS (
SELECT *,
LAST_VALUE(Liters_remaining_actual IGNORE NULLS) OVER w AS actual_liters,
IF(Liters_remaining_actual IS NULL,
LAG(Liters_per_Minute) OVER w, -1 * Liters_per_Minute
) AS pre_liters_per_minute,
FROM sample_table
WINDOW w AS (ORDER BY Timestamp)
)
SELECT *,
actual_liters - SUM(Liters_per_Minute pre_liters_per_minute) OVER w / 2 AS Liters_remaining_estimated
FROM partitioned
WINDOW w AS (PARTITION BY actual_liters ORDER BY Timestamp)
ORDER BY Timestamp;