Home > Enterprise >  Generate step time-series in SQL (MS SQL / PostgreSQL)
Generate step time-series in SQL (MS SQL / PostgreSQL)

Time:09-01

We are storing data corresponding to rates (ex: electricity price) in a SQL table, such as:

Date Value
2022-08-25 01:00 12.3
2022-09-23 06:12 14.5
2022-10-18 05:34 9.8

The date interval between two rows is not regular. In this table, 12.3 is the current rate until it's replaced by the new value on September 23rd, when the rate becomes 14.5

From there, we want to generate an hourly time-series, with each value corresponding to the correct rate, such as:

Date Value
2022-08-25 01:00 12.3
2022-08-25 02:00 12.3
2022-08-25 03:00 12.3
2022-08-25 04:00 12.3
2022-08-25 05:00 12.3
... 12.3
2022-09-23 06:12 14.5
2022-09-23 07:00 14.5
2022-09-23 08:00 14.5
... 14.5
2022-10-18 05:34 9.8
... 9.8

how you would generate such as time-series in MS SQL or PostgreSQL ?

CodePudding user response:

So you need to do two things: generate the time series with hourly intervals and then check for each interval which value was active during that.

For Postgres I would also create a timestamp range that contains the start and end of the range in which the price is valid (excluding the upper bound). This can be used in a join condition against the generated time series

with time_series ("date") as (
  select g.*
  from ( 
    select min("date") as start_date, max("date") as end_date
    from the_table
  ) x 
    cross join generate_series(x.start_date, x.end_date, interval '1 hour') as g
), ranges as (
  select tsrange("date", lead("date") over (order by "date"), '(]') as valid_during, 
         value
  from the_table
)
select ts."date",
       r.value 
from time_series ts
  join ranges r on r.valid_during @> ts."date"

If you don't really need a "dynamic time series", you can just use generate_series() with a hard-coded start and end which would simplify this a bit.

Online example

CodePudding user response:

This is solution for Postgres. I think it's what you wanted, the intervals end with full hour and after generation ends the next hour is exact timestamp from the original table (see table). It was done through comparison of the generated date with original date truncated to the hours. To make sure that the last date appears in the result I made COALESCE on LAG window function to fill the NULL value with the last date. Hope it doesn't look too hacky.

hourly_interval value
2022-08-25 01:00:00 12.3
2022-08-25 02:00:00 12.3
... ...
2022-09-23 06:00:00 12.3
2022-09-23 06:12:00 14.5
2022-09-23 07:00:00 14.5
... ...
2022-10-18 05:00:00 14.5
2022-10-18 05:34:00 9.8

The result has 1303 rows

WITH cte AS (
    SELECT *,
    date_trunc('hour',generate_series(date,
        COALESCE((LAG(date,-1) OVER (ORDER BY date)),date),
        '1 hour')) hourly_interval
    FROM electricity
) 
SELECT
    CASE WHEN
        hourly_interval = date_trunc('hour',date)
    THEN
        date
    ELSE
        hourly_interval
    END AS hourly_interval,
    value
FROM cte

Feel free to fiddle around

  • Related