I have the next forecast table:
| id | timestamp | name | temp |
------------------------------------------------------------------
| 1 | 2022-01-16 12:40:06 | Bancal 1 | 22 |
| 2 | 2022-01-16 12:58:05 | Bancal 1 | 21 |
| 3 | 2022-01-16 13:22:00 | Bancal 1 | 30 |
| 4 | 2022-01-16 13:30:20 | Bancal 1 | 10 |
| 5 | 2022-01-16 13:59:06 | Bancal 1 | 15 |
| 6 | 2022-01-16 15:40:00 | Bancal 2 | 15 |
| 7 | 2022-01-16 15:54:06 | Bancal 1 | 18 |
| 8 | 2022-01-17 10:30:05 | Bancal 2 | 23 |
| 9 | 2022-01-17 11:20:00 | Bancal 1 | 12 |
| 10 | 2022-01-17 11:32:07 | Bancal 3 | 28 |
| 11 | 2022-01-17 13:30:06 | Bancal 1 | 23 |
I want to make a query by intervals of 1 hour and fill in the empty spaces, but I want it to start exactly at the indicated hour and minute, if I say to start from the date time '2022-01-16 12:38:52'
then the intervals of 1 hour should be:
2022-01-16 12:38:52
2022-01-16 13:38:52
2022-01-16 14:38:52
2022-01-16 15:38:52
.
.
.
2022-01-17 09:38:52
2022-01-17 10:38:52
2022-01-17 11:38:52
2022-01-17 12:38:52
2022-01-17 13:38:52
Use the time_bucket_gapfill function of timescaledb but the gaps are made at the beginning of the hour:
SELECT time_bucket_gapfill(interval '1 hour', timestamp,) AS init,
name,
avg(temp) AS avg_temp
FROM forecast
WHERE timestamp >= '2022-01-16 12:38:52' AND timestamp<= '2022-01-17 13:38:52'
GROUP BY name, init
ORDER BY init;
| init | name | avg_temp
2022-01-16 12:00:00.000000 | Bancal 2 |
2022-01-16 12:00:00.000000 | Bancal 1 | 21.5
2022-01-16 12:00:00.000000 | Bancal 3 |
2022-01-16 13:00:00.000000 | Bancal 1 | 18.3333333333333333
2022-01-16 13:00:00.000000 | Bancal 3 |
2022-01-16 13:00:00.000000 | Bancal 2 |
2022-01-16 14:00:00.000000 | Bancal 3 |
2022-01-16 14:00:00.000000 | Bancal 1 |
2022-01-16 14:00:00.000000 | Bancal 2 |
2022-01-16 15:00:00.000000 | Bancal 2 | 15
2022-01-16 15:00:00.000000 | Bancal 1 | 18
2022-01-16 15:00:00.000000 | Bancal 3 |
...
2022-01-17 09:00:00.000000 | Bancal 1 |
2022-01-17 10:00:00.000000 | Bancal 2 | 23
2022-01-17 10:00:00.000000 | Bancal 3 |
2022-01-17 10:00:00.000000 | Bancal 1 |
2022-01-17 11:00:00.000000 | Bancal 2 |
2022-01-17 11:00:00.000000 | Bancal 1 | 12
2022-01-17 11:00:00.000000 | Bancal 3 | 28
2022-01-17 12:00:00.000000 | Bancal 2 |
2022-01-17 12:00:00.000000 | Bancal 1 |
2022-01-17 12:00:00.000000 | Bancal 3 |
2022-01-17 13:00:00.000000 | Bancal 3 |
2022-01-17 13:00:00.000000 | Bancal 1 | 23
2022-01-17 13:00:00.000000 | Bancal 2 |
The result of avg is unexpected as it takes the data from '2022-01-16 12:00:00' to '2022-01-16 13:00:00'
instead of '2022-01-16 12:38: 52' to '2022-01-16 13:38:52'
Is there a way for time_bucket_gapfill to do those gaps?
EXPECTED:
| init | name | avg_temp
2022-01-16 12:38:00.000000 | Bancal 2 |
2022-01-16 12:38:00.000000 | Bancal 1 | 20.75
2022-01-16 12:38:00.000000 | Bancal 3 |
2022-01-16 13:38:00.000000 | Bancal 1 | 15
2022-01-16 13:38:00.000000 | Bancal 3 |
2022-01-16 13:38:00.000000 | Bancal 2 |
2022-01-16 14:38:00.000000 | Bancal 3 |
2022-01-16 14:38:00.000000 | Bancal 1 |
2022-01-16 14:38:00.000000 | Bancal 2 |
2022-01-16 15:38:00.000000 | Bancal 2 | 15
2022-01-16 15:38:00.000000 | Bancal 1 | 18
2022-01-16 15:38:00.000000 | Bancal 3 |
...
2022-01-17 09:38:00.000000 | Bancal 2 | 23
2022-01-17 10:38:00.000000 | Bancal 2 |
2022-01-17 10:38:00.000000 | Bancal 3 | 28
2022-01-17 10:38:00.000000 | Bancal 1 | 12
2022-01-17 11:38:00.000000 | Bancal 2 |
2022-01-17 11:38:00.000000 | Bancal 1 |
2022-01-17 11:38:00.000000 | Bancal 3 |
2022-01-17 12:38:00.000000 | Bancal 2 |
2022-01-17 12:38:00.000000 | Bancal 1 | 23
2022-01-17 12:38:00.000000 | Bancal 3 |
2022-01-17 13:38:00.000000 | Bancal 3 |
2022-01-17 13:38:00.000000 | Bancal 1 |
2022-01-17 13:38:00.000000 | Bancal 2 |
CodePudding user response:
I would use generate_series function.
generate_series(start, stop, step interval)
The third parameter you can write your expect interval. In your case that might be 1 hours
SELECT *
FROM generate_series('2022-01-16 12:38:52'::timestamp,'2022-01-17 13:38:52'::timestamp,'1 hours') v
EDIT
You can try to use CTE or subquery to make a calendar to represent time gap per name
, then use LEAD
window function to get the timestamp interval for JOIN
condition.
WITH CTE AS (
SELECT DISTINCT
name,
generate_series('2022-01-16 12:38:52'::timestamp,'2022-01-17 13:38:52'::timestamp,'1 hours') dt
FROM forecast
)
SELECT t1.name init,
t1.dt,
avg(coalesce(t2.temp,0)) AS avg_temp
FROM (
SELECT *,LEAD(dt) OVER(PARTITION BY name ORDER BY dt) n_dt
FROM CTE
) t1
LEFT JOIN forecast t2
ON t1.name = t2.name AND t2.timestamp BETWEEN t1.dt AND t1.n_dt
GROUP BY t1.name,
t1.dt
ORDER BY t1.dt