I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
I managed to return the data I need using generate_series() and CROSS JOIN LATERAL but the query is really slow (3s ) compared to date_trunc() which was returning in 150ms:
SELECT *
FROM (
SELECT generate_series('2021-01-07T00:00:00.000Z', '2022-01-06T23:59:59.999Z', interval '7 day') AS ts
) grid
CROSS JOIN LATERAL (
SELECT avg("GAS") AS gas,
avg("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= grid.ts
AND "DATETIME" < grid.ts interval '7 day'
) c;
ts gas coal
"2021-01-07 00:00:00 00" 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00 00" 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00 00" 15278.854166666666 1592.7916666666667
Any ideas how to query this more effectively please? Thanks.
CodePudding user response:
In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) interval '7' day)::date
This will effectively get the records from January 4,2021
until (and including ) January 9,2022
Note: I changed <=
to <
to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d
is the column containing the datetime-value.)
see: dbfiddle
CodePudding user response:
Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts
and the start_date
as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 the difference in hours of the day
extract (day from ts - start_date)*24 extract (hour from ts - start_date)
The difference can be used in GROUP BY
directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1