My goal is to query the delta of a counter metric over time using a continuous aggreate from TimescaleDB.
I'm following the example from Running a counter aggregate query with a continuous aggregate
I created the table:
CREATE TABLE example (
measure_id BIGINT,
ts TIMESTAMPTZ ,
val DOUBLE PRECISION,
PRIMARY KEY (measure_id, ts)
);
Inserted some values for testing:
INSERT INTO example VALUES
(1, '2022-05-07 10:00', 0),
(1, '2022-05-07 10:05', 5),
(1, '2022-05-07 10:10', 10),
(1, '2022-05-07 10:15', 15),
(1, '2022-05-07 10:20', 20),
(1, '2022-05-07 10:25', 25);
And then run the query that puts this data into time buckets and uses the delta accessor function to calculate the difference of the measurements over time:
SELECT measure_id,
time_bucket('15 min'::interval, ts) as bucket,
delta(counter_agg(ts, val, toolkit_experimental.time_bucket_range('15 min'::interval, ts)))
FROM example
GROUP BY measure_id, time_bucket('15 min'::interval, ts);
But the results don't match my expections:
measure_id | bucket | delta |
---|---|---|
1 | 2022-05-07 10:00:00 01 | 10 |
1 | 2022-05-07 10:15:00 01 | 10 |
The delta accessor works correctly as it substracts the last value of each time bucket from the first
10 - 0 = 10
and
25 - 15 = 10
but the overall delta is not correct: the metric grew from 0 to 25 which is equal to 25 and not 10 10 which is only 20.
How can I get the "proper" delta?
CodePudding user response:
it looks as though you may have encountered a known issue. Although it's not ideal for Stack Overflow there are some very long conversations at the two links I'll post at the end of this answer. They're too detailed to post here accurately. However, I will post an explanation of the source of the problem, and a workaround is currently in development (July 2022). You can find out more about the workaround at the second link.
Unfortunately, these aggregates have a major weakness when data is grouped by time range, which is a very common use case. When Postgres groups data, each point is uniquely assigned to one group, and a function called on that group has no visibility into adjacent groups. This breaks our ability to do pairwise computations at the boundaries between groups.
We should be able to work around this using our pipeline functionality, but it will take a significant effort to design and mature a robust solution based around this (and this might not even be viable for mega-scale workloads). Fortunately Postgres does have a mechanism for this problem that we can leverage for a more immediate workaround, and that is window functions.
- https://github.com/timescale/timescaledb-toolkit/issues/358
- https://github.com/timescale/timescaledb-toolkit/issues/440
Transparency: I work for Timescale
CodePudding user response:
Thanks to the comments from @greenweeds I worked out this solution which seems to
- account for counter resets and
- ignore the initial counter value (I just want differences over time and the first value would be treated as such a difference although it is not)
Hope this helps others as well. My query:
SELECT
measure_id,
time_bucket('15 min', ts) AS bucket,
SUM(diff_val) AS delta
FROM(
SELECT
ts,
measure_id,
-- Use 0 on counter resets
CASE WHEN diff_val >= 0 THEN diff_val
ELSE 0
END AS diff_val
FROM (
SELECT
ts,
measure_id,
-- Calculate the change in the counter from current to last value
val - LAG(val,1,0) OVER (
ORDER BY ts
) AS diff_val
FROM example
ORDER BY 1,2
) AS base
-- Remove the first (oldest) data rows - they containt the initial counter values which we ignore as they could be very high and make the final chart unusable
WHERE ts != (SELECT ts FROM example LIMIT 1)
) AS final
GROUP BY 1,2
ORDER BY 1,2
I tested this against a more complex data set:
INSERT INTO example VALUES
(1, '2022-05-07 10:00', 1000),
(1, '2022-05-07 10:05', 1005),
(1, '2022-05-07 10:10', 1010),
(1, '2022-05-07 10:12', 0),
(1, '2022-05-07 10:15', 5),
(1, '2022-05-07 10:20', 10),
(1, '2022-05-07 10:31', 15);
And the result is what I expected:
| measure_id | bucket | delta |
|------------|------------------------|-------|
| 1 | 2022-05-07 10:00:00 01 |10 |
| 1 | 2022-05-07 10:15:00 01 |10 |
| 1 | 2022-05-07 10:30:00 01 |5 |