I have a table containing timeseries data and want to select a number of rows based on some "first" criteria as well as those with timestamps close to those select based on the first criteria.
Example table:
ts | val |
---|---|
2022-11-01 09:55:00 | 1 |
2022-11-01 09:55:57 | 1 |
2022-11-01 10:00:00 | 1 |
2022-11-01 10:00:10 | 1 |
2022-11-01 10:00:20 | 1 |
2022-11-01 10:00:25 | 2 |
2022-11-01 10:00:30 | 1 |
2022-11-01 10:00:57 | 1 |
2022-11-01 10:20:15 | 1 |
2022-11-01 10:20:35 | 5 |
2022-11-01 10:20:55 | 1 |
2022-11-01 10:21:01 | 1 |
2022-11-01 10:21:30 | 1 |
From this table, I want to SELECT * FROM table WHERE val > 1
plus any rows with ts
close to those rows, for example a) within /- 30 second time difference or b) within the same "absolute" minute (e.g. from second 00 of the minute of the timestamp until second 59.
So as a result of option a), I would like to get:
ts | val | comment |
---|---|---|
2022-11-01 09:55:57 | 1 | Data within 30 s of 2022-11-01 10:00:25 |
2022-11-01 10:00:00 | 1 | Data within 30 s of 2022-11-01 10:00:25 |
2022-11-01 10:00:10 | 1 | Data within 30 s of 2022-11-01 10:00:25 |
2022-11-01 10:00:20 | 1 | Data within 30 s of 2022-11-01 10:00:25 |
2022-11-01 10:00:25 | 2 | Data within 30 s of 2022-11-01 10:00:25 |
2022-11-01 10:00:30 | 1 | Data within 30 s of 2022-11-01 10:20:35 |
2022-11-01 10:20:15 | 1 | Data within 30 s of 2022-11-01 10:20:35 |
2022-11-01 10:20:35 | 5 | Data within 30 s of 2022-11-01 10:20:35 |
2022-11-01 10:20:55 | 1 | Data within 30 s of 2022-11-01 10:20:35 |
or in case of option b), So as a result of option a), I would like to get:
ts | val | comment |
---|---|---|
2022-11-01 10:00:00 | 1 | Same minute as 2022-11-01 10:00:25 |
2022-11-01 10:00:10 | 1 | Same minute as 2022-11-01 10:00:25 |
2022-11-01 10:00:20 | 1 | Same minute as 2022-11-01 10:00:25 |
2022-11-01 10:00:25 | 2 | Same minute as 2022-11-01 10:00:25 |
2022-11-01 10:00:30 | 1 | Same minute as 2022-11-01 10:20:35 |
2022-11-01 10:00:57 | 1 | Same minute as 2022-11-01 10:20:35 |
2022-11-01 10:20:15 | 1 | Same minute as 2022-11-01 10:20:35 |
2022-11-01 10:20:35 | 5 | Same minute as 2022-11-01 10:20:35 |
2022-11-01 10:20:55 | 1 | Same minute as 2022-11-01 10:20:35 |
I tried to get option a) working using a subquery and DATEDIFF
, but it seems this won't work because it's not possible to use DATEDIFF
and such functions on subqueries. This is the query I tried:
SELECT * FROM table t WHERE ABS(DATEDIFF(second, t.ts, (SELECT ts FROM table WHERE val > 1))) <= 30
What's the best way to get this working only using SQL and for large tables?
CodePudding user response:
What you want is still somewhat unclear for me, but here is an example of using DATEDIFF
to calculate the minute difference between timestamps (see this identical fiddle):
Create the table:
CREATE TABLE events (
event_ts DATETIME,
event_value INTEGER
);
Populate with data (I've added a row to better illustrate the query):
INSERT INTO events VALUES
('2022-11-01 10:00:00', 1),
('2022-11-01 10:00:10', 1),
('2022-11-01 10:00:20', 1),
('2022-11-01 10:00:25', 2),
('2022-11-01 10:00:30', 1),
('2022-11-01 10:00:55', 1),
('2022-11-01 10:20:15', 1),
('2022-11-01 10:20:35', 5),
('2022-11-01 10:20:55', 1),
('2022-11-01 10:30:55', 6)
Query:
WITH events_filtered AS (
SELECT *
FROM events
WHERE event_value > 1
)
SELECT
*,
DATEDIFF(mi, e1.event_ts, e2.event_ts) AS event_minute_difference
FROM events_filtered AS e1
INNER JOIN events_filtered AS e2
ON e1.event_ts < e2.event_ts
I've split the two conditions into a CTE and a final query for the sake of clarity (but you could and probably should contain this in the same query). The result is:
event_ts | event_value | event_ts | event_value | event_minute_difference |
---|---|---|---|---|
2022-11-01 10:00:25.000 | 2 | 2022-11-01 10:20:35.000 | 5 | 20 |
2022-11-01 10:00:25.000 | 2 | 2022-11-01 10:30:55.000 | 6 | 30 |
2022-11-01 10:20:35.000 | 5 | 2022-11-01 10:30:55.000 | 6 | 10 |
I cannot comment of performance (and opportunities for improving performance) without really knowing more about the problem.
CodePudding user response:
Looking at your first requirement (a question really should focus on a single problem only), one way you can accomplish this is by using an exists semi-join, comparing each ts
date with qualifying min and max dates derived in the exists criteria.
Your second (or any other) criteria can work a very similar way.
select t.*
from t
where exists (
select * from t t2
cross apply (values(
case when val > 1 then DateAdd(second, -30, ts) end,
case when val > 1 then DateAdd(second, 30, ts) end)
)r(rmin,rmin)
where t.ts >= rmin and t.ts <= rmin
) order by ts;
See demo Fiddle