How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1-CodePudding

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:

I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.

SELECT date_trunc('week', "DATETIME") AS week, 
count(*), 
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;

returns the first time series interval as 2021-01-04 with an incorrect count:

week                    count  gas                 coal
"2021-01-04 00:00:00"   192    18291.34375         2321.4427083333335
"2021-01-11 00:00:00"   336    14477.407738095239  2027.547619047619
"2021-01-18 00:00:00"   336    13947.044642857143  1152.047619047619

I managed to return the data I need using generate_series() and CROSS JOIN LATERAL but the query is really slow (3s ) compared to date_trunc() which was returning in 150ms:

SELECT *
FROM  (
   SELECT generate_series('2021-01-07T00:00:00.000Z', '2022-01-06T23:59:59.999Z', interval '7 day') AS ts
) grid
CROSS JOIN LATERAL (
   SELECT avg("GAS") AS gas,
          avg("COAL") AS coal
   FROM   "Generation"
   WHERE  "DATETIME" >= grid.ts
   AND    "DATETIME" <  grid.ts   interval '7 day'
) c;

ts                          gas                 coal
"2021-01-07 00:00:00 00"    17242.752976190477  2293.8541666666665
"2021-01-14 00:00:00 00"    13481.497023809523  1483.0565476190477
"2021-01-21 00:00:00 00"    15278.854166666666  1592.7916666666667

Any ideas how to query this more effectively please? Thanks.

CodePudding user response：

In order to select the complete week, you should change the WHERe-clause to something like:

WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp) 
  AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp)   interval '7' day)::date

This will effectively get the records from January 4,2021 until (and including ) January 9,2022

Note: I changed <= to < to stop the end-date being included!

EDIT:

when you want your weeks to start on January 7, you can always group by:

(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7

(where d is the column containing the datetime-value.)

see: dbfiddle

CodePudding user response：

Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows

Difference in Days

extract (day from ts - start_date)

Difference in Weeks

Is the difference in day divided by 7 and truncated

trunc(extract (day from ts - start_date)/7)

Difference in Hours

Is the difference in day times 24 the difference in hours of the day

extract (day from ts - start_date)*24   extract (hour from ts - start_date)

The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.

Sample Example

I'm using a CTE for the start date to avoid multpile copies of the paramater

with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select 
  ts,
  extract (day from ts - (select start_ts from start_time)) day_diff,
  trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
  extract (day from ts - (select start_ts from start_time)) *24   extract (hour from ts - (select start_ts from start_time)) hour_diff,
  value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1