Home > database >  SQL to get data on top of the hour from a time series database
SQL to get data on top of the hour from a time series database

Time:04-12

I have a third party app that writes to Vertica database every 5 minutes. As a result, a sample table looks like this:

CREATE TABLE sample (
    item_id int,
    metric_val float,
    ts timestamp
);

-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')

As a result, there are 288 (24 hours * 12 entries each hour) rows for each day for a given item. I want to retrieve the records at the top of each hour i.e.

1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00

I tried the below query but the challenge is to increment the value of 'n'

WITH a AS (
    SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
    FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1

The Vertica TIME_SLICE function seems promising but I couldn't make that work even after multiple attempts. Can this please be advised?

SELECT version();
Vertica Analytic Database v10.1.1-0

CodePudding user response:

Seems pretty simple - or do I miss something?

Just filter out the rows whose ts truncated to the hour ('HH') is equal to ts ...

WITH sample (item_id, metric_val, ts) AS (                                                                                                                  
-- Hypothetical sample values in 2nd column; these can be any values
            SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
  UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
  UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
  UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
  UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
  UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
  UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
  UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
  UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
  UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
  UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
)
SELECT
  *
FROM sample
WHERE TRUNC(ts,'HH') = ts;
-- out  item_id | metric_val |         ts          
-- out --------- ------------ ---------------------
-- out        1 |       11.0 | 2022-03-29 00:00:00
-- out        1 |       12.1 | 2022-03-29 01:00:00
-- out        1 |       13.1 | 2022-03-29 02:00:00
  • Related