I have the storage that stores huge time series data and I can extract data through AWS athena. However, I don't have any ideas how I can extract time series data before and after a specific event by using AWS athena.
What query can achieve it ?
Does anyone have ideas and samples of query on Athena ?
For example, I have following input data.
<input data>
id | timestamp | value | level |
---------------------------------------------
1 | 2021-04-01T12:00:00 00:00 | 100.0 | 1 |
2 | 2021-04-01T12:00:10 00:00 | 98.0 | 1 |
3 | 2021-04-01T12:00:20 00:00 | 99.5 | 1 |
...
58 | 2021-04-01T12:09:40 00:00 | 98.2 | 1 |
59 | 2021-04-01T12:09:50 00:00 | 95.3 | 1 |
60 | 2021-04-01T12:10:00 00:00 | 99.2 | 1 |
61 | 2021-04-01T12:10:10 00:00 | 97.6 | 2 |
62 | 2021-04-01T12:10:20 00:00 | 98.6 | 2 |
63 | 2021-04-01T12:10:30 00:00 | 98.3 | 2 |
64 | 2021-04-01T12:10:40 00:00 | 98.1 | 2 |
...
100 | 2021-04-01T12:16:40 00:00 | 97.6 | 2 |
What I want to do is to extract the records for 30 sec before and after level 1->2
change event.
In this case, the expected outputs are data from id:58 to id:64.
CodePudding user response:
You can use 'lag' function to determine timestamps where the level changes:
SELECT *
FROM (SELECT *
FROM (SELECT timestamp,
lag(level) OVER (order by timestamp) AS prev_level,
level
FROM dataset)
WHERE prev_level != level)
And then use those timestamps to filter out the dataset. For example something like this:
WITH dataset(id,timestamp,value,level) AS (
VALUES
('1',timestamp '2021-04-01 12:00:00 00:00',100.0,1),
('2',timestamp '2021-04-01 12:00:10 00:00',98.0,1),
('3',timestamp '2021-04-01 12:00:20 00:00',99.5,1),
('58',timestamp '2021-04-01 12:09:40 00:00',98.2,1),
('59',timestamp '2021-04-01 12:09:50 00:00',95.3,1),
('60',timestamp '2021-04-01 12:10:00 00:00',99.2,1),
('61',timestamp '2021-04-01 12:10:10 00:00',97.6,2),
('62',timestamp '2021-04-01 12:10:20 00:00',98.6,2),
('63',timestamp '2021-04-01 12:10:30 00:00',98.3,2),
('64',timestamp '2021-04-01 12:10:40 00:00',98.1,2),
('100',timestamp '2021-04-01 12:16:40 00:00',97.6,2)
)
SELECT *
FROM dataset o
WHERE EXISTS(
SELECT *
FROM (SELECT *
FROM (SELECT timestamp,
lag(level) OVER (order by timestamp) AS prev_level,
level
FROM dataset)
WHERE prev_level != level)
WHERE (o.level = level AND o.timestamp - timestamp < interval '30' second)
OR (o.level = prev_level AND timestamp - o.timestamp < interval '30' second)
)
Output:
id | timestamp | value | level |
---|---|---|---|
59 | 2021-04-01 12:09:50.000 UTC | 95.3 | 1 |
60 | 2021-04-01 12:10:00.000 UTC | 99.2 | 1 |
61 | 2021-04-01 12:10:10.000 UTC | 97.6 | 2 |
62 | 2021-04-01 12:10:20.000 UTC | 98.6 | 2 |
63 | 2021-04-01 12:10:30.000 UTC | 98.3 | 2 |