We are storing data corresponding to rates (ex: electricity price) in a SQL table, such as:
Date | Value |
---|---|
2022-08-25 01:00 | 12.3 |
2022-09-23 06:12 | 14.5 |
2022-10-18 05:34 | 9.8 |
The date interval between two rows is not regular. In this table, 12.3 is the current rate until it's replaced by the new value on September 23rd, when the rate becomes 14.5
From there, we want to generate an hourly time-series, with each value corresponding to the correct rate, such as:
Date | Value |
---|---|
2022-08-25 01:00 | 12.3 |
2022-08-25 02:00 | 12.3 |
2022-08-25 03:00 | 12.3 |
2022-08-25 04:00 | 12.3 |
2022-08-25 05:00 | 12.3 |
... | 12.3 |
2022-09-23 06:12 | 14.5 |
2022-09-23 07:00 | 14.5 |
2022-09-23 08:00 | 14.5 |
... | 14.5 |
2022-10-18 05:34 | 9.8 |
... | 9.8 |
how you would generate such as time-series in MS SQL or PostgreSQL ?
CodePudding user response:
So you need to do two things: generate the time series with hourly intervals and then check for each interval which value
was active during that.
For Postgres I would also create a timestamp range that contains the start and end of the range in which the price is valid (excluding the upper bound). This can be used in a join condition against the generated time series
with time_series ("date") as (
select g.*
from (
select min("date") as start_date, max("date") as end_date
from the_table
) x
cross join generate_series(x.start_date, x.end_date, interval '1 hour') as g
), ranges as (
select tsrange("date", lead("date") over (order by "date"), '(]') as valid_during,
value
from the_table
)
select ts."date",
r.value
from time_series ts
join ranges r on r.valid_during @> ts."date"
If you don't really need a "dynamic time series", you can just use generate_series()
with a hard-coded start and end which would simplify this a bit.
CodePudding user response:
This is solution for Postgres. I think it's what you wanted, the intervals end with full hour and after generation ends the next hour is exact timestamp from the original table (see table). It was done through comparison of the generated date with original date truncated to the hours. To make sure that the last date appears in the result I made COALESCE
on LAG
window function to fill the NULL value with the last date. Hope it doesn't look too hacky.
hourly_interval | value |
---|---|
2022-08-25 01:00:00 | 12.3 |
2022-08-25 02:00:00 | 12.3 |
... | ... |
2022-09-23 06:00:00 | 12.3 |
2022-09-23 06:12:00 | 14.5 |
2022-09-23 07:00:00 | 14.5 |
... | ... |
2022-10-18 05:00:00 | 14.5 |
2022-10-18 05:34:00 | 9.8 |
The result has 1303 rows
WITH cte AS (
SELECT *,
date_trunc('hour',generate_series(date,
COALESCE((LAG(date,-1) OVER (ORDER BY date)),date),
'1 hour')) hourly_interval
FROM electricity
)
SELECT
CASE WHEN
hourly_interval = date_trunc('hour',date)
THEN
date
ELSE
hourly_interval
END AS hourly_interval,
value
FROM cte
Feel free to fiddle around