I have a third party app that writes to Vertica database every 5 minutes. As a result, a sample table looks like this:
CREATE TABLE sample (
item_id int,
metric_val float,
ts timestamp
);
-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')
As a result, there are 288 (24 hours * 12 entries each hour) rows for each day for a given item. I want to retrieve the records at the top of each hour i.e.
1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00
I tried the below query but the challenge is to increment the value of 'n'
WITH a AS (
SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1
The Vertica TIME_SLICE
function seems promising but I couldn't make that work even after multiple attempts. Can this please be advised?
SELECT version();
Vertica Analytic Database v10.1.1-0
CodePudding user response:
Seems pretty simple - or do I miss something?
Just filter out the rows whose ts
truncated to the hour ('HH'
) is equal to ts
...
WITH sample (item_id, metric_val, ts) AS (
-- Hypothetical sample values in 2nd column; these can be any values
SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
)
SELECT
*
FROM sample
WHERE TRUNC(ts,'HH') = ts;
-- out item_id | metric_val | ts
-- out --------- ------------ ---------------------
-- out 1 | 11.0 | 2022-03-29 00:00:00
-- out 1 | 12.1 | 2022-03-29 01:00:00
-- out 1 | 13.1 | 2022-03-29 02:00:00