Home > Net >  start from hours and minutes with time_bucket_gapfill
start from hours and minutes with time_bucket_gapfill

Time:02-18

I have the next forecast table:

   |        id       |      timestamp      |     name    | temp      |
    ------------------------------------------------------------------
    |        1       | 2022-01-16 12:40:06 |  Bancal 1   |     22    |
    |        2       | 2022-01-16 12:58:05 |  Bancal 1   |     21    |
    |        3       | 2022-01-16 13:22:00 |  Bancal 1   |     30    |
    |        4       | 2022-01-16 13:30:20 |  Bancal 1   |     10    |
    |        5       | 2022-01-16 13:59:06 |  Bancal 1   |     15    |
    |        6       | 2022-01-16 15:40:00 |  Bancal 2   |     15    |
    |        7       | 2022-01-16 15:54:06 |  Bancal 1   |     18    |
    |        8       | 2022-01-17 10:30:05 |  Bancal 2   |     23    |
    |        9       | 2022-01-17 11:20:00 |  Bancal 1   |     12    |
    |        10      | 2022-01-17 11:32:07 |  Bancal 3   |     28    |
    |        11       | 2022-01-17 13:30:06 |  Bancal 1   |     23    |

I want to make a query by intervals of 1 hour and fill in the empty spaces, but I want it to start exactly at the indicated hour and minute, if I say to start from the date time '2022-01-16 12:38:52' then the intervals of 1 hour should be:

2022-01-16 12:38:52
2022-01-16 13:38:52
2022-01-16 14:38:52
2022-01-16 15:38:52
       .
       .
       .
2022-01-17 09:38:52
2022-01-17 10:38:52
2022-01-17 11:38:52
2022-01-17 12:38:52
2022-01-17 13:38:52

Use the time_bucket_gapfill function of timescaledb but the gaps are made at the beginning of the hour:

    SELECT time_bucket_gapfill(interval '1 hour', timestamp,) AS init,
       name,
       avg(temp) AS avg_temp
    FROM forecast
    WHERE timestamp >= '2022-01-16 12:38:52' AND timestamp<= '2022-01-17 13:38:52'
    GROUP BY name, init
    ORDER BY init;

|          init            |    name  |  avg_temp 
2022-01-16 12:00:00.000000 | Bancal 2 |
2022-01-16 12:00:00.000000 | Bancal 1 |  21.5
2022-01-16 12:00:00.000000 | Bancal 3 |  
2022-01-16 13:00:00.000000 | Bancal 1 |  18.3333333333333333
2022-01-16 13:00:00.000000 | Bancal 3 |  
2022-01-16 13:00:00.000000 | Bancal 2 |  
2022-01-16 14:00:00.000000 | Bancal 3 |  
2022-01-16 14:00:00.000000 | Bancal 1 |  
2022-01-16 14:00:00.000000 | Bancal 2 |  
2022-01-16 15:00:00.000000 | Bancal 2 |  15
2022-01-16 15:00:00.000000 | Bancal 1 |  18
2022-01-16 15:00:00.000000 | Bancal 3 |  
...
2022-01-17 09:00:00.000000 | Bancal 1 |  
2022-01-17 10:00:00.000000 | Bancal 2 |  23
2022-01-17 10:00:00.000000 | Bancal 3 |  
2022-01-17 10:00:00.000000 | Bancal 1 |  
2022-01-17 11:00:00.000000 | Bancal 2 |  
2022-01-17 11:00:00.000000 | Bancal 1 |  12
2022-01-17 11:00:00.000000 | Bancal 3 |  28
2022-01-17 12:00:00.000000 | Bancal 2 |  
2022-01-17 12:00:00.000000 | Bancal 1 |  
2022-01-17 12:00:00.000000 | Bancal 3 |  
2022-01-17 13:00:00.000000 | Bancal 3 |  
2022-01-17 13:00:00.000000 | Bancal 1 |  23
2022-01-17 13:00:00.000000 | Bancal 2 |  

The result of avg is unexpected as it takes the data from '2022-01-16 12:00:00' to '2022-01-16 13:00:00' instead of '2022-01-16 12:38: 52' to '2022-01-16 13:38:52' Is there a way for time_bucket_gapfill to do those gaps?

EXPECTED:

|          init            |    name  |  avg_temp 
2022-01-16 12:38:00.000000 | Bancal 2 |
2022-01-16 12:38:00.000000 | Bancal 1 |  20.75
2022-01-16 12:38:00.000000 | Bancal 3 |  
2022-01-16 13:38:00.000000 | Bancal 1 |  15
2022-01-16 13:38:00.000000 | Bancal 3 |  
2022-01-16 13:38:00.000000 | Bancal 2 |  
2022-01-16 14:38:00.000000 | Bancal 3 |  
2022-01-16 14:38:00.000000 | Bancal 1 |  
2022-01-16 14:38:00.000000 | Bancal 2 |  
2022-01-16 15:38:00.000000 | Bancal 2 |  15
2022-01-16 15:38:00.000000 | Bancal 1 |  18
2022-01-16 15:38:00.000000 | Bancal 3 |  
...
2022-01-17 09:38:00.000000 | Bancal 2 |  23
2022-01-17 10:38:00.000000 | Bancal 2 |  
2022-01-17 10:38:00.000000 | Bancal 3 |  28
2022-01-17 10:38:00.000000 | Bancal 1 |  12
2022-01-17 11:38:00.000000 | Bancal 2 |  
2022-01-17 11:38:00.000000 | Bancal 1 |  
2022-01-17 11:38:00.000000 | Bancal 3 |  
2022-01-17 12:38:00.000000 | Bancal 2 |  
2022-01-17 12:38:00.000000 | Bancal 1 |  23
2022-01-17 12:38:00.000000 | Bancal 3 |  
2022-01-17 13:38:00.000000 | Bancal 3 |  
2022-01-17 13:38:00.000000 | Bancal 1 |  
2022-01-17 13:38:00.000000 | Bancal 2 |  

CodePudding user response:

I would use generate_series function.

generate_series(start, stop, step interval)

The third parameter you can write your expect interval. In your case that might be 1 hours

SELECT *
FROM generate_series('2022-01-16 12:38:52'::timestamp,'2022-01-17 13:38:52'::timestamp,'1 hours') v

sqlfiddle

EDIT

You can try to use CTE or subquery to make a calendar to represent time gap per name, then use LEAD window function to get the timestamp interval for JOIN condition.

WITH CTE AS (
  SELECT DISTINCT  
         name,
         generate_series('2022-01-16 12:38:52'::timestamp,'2022-01-17 13:38:52'::timestamp,'1 hours') dt
  FROM forecast
)
SELECT  t1.name init,
        t1.dt,
        avg(coalesce(t2.temp,0)) AS avg_temp
FROM (
   SELECT *,LEAD(dt) OVER(PARTITION BY name ORDER BY dt) n_dt
   FROM CTE 
) t1
LEFT JOIN forecast t2
ON t1.name = t2.name  AND t2.timestamp BETWEEN t1.dt AND t1.n_dt
GROUP BY t1.name,
        t1.dt
ORDER BY t1.dt

sqlfiddle

  • Related