Home > Mobile >  Select rows and those with timestamps close to it
Select rows and those with timestamps close to it

Time:11-23

I have a table containing timeseries data and want to select a number of rows based on some "first" criteria as well as those with timestamps close to those select based on the first criteria.

Example table:

ts val
2022-11-01 09:55:00 1
2022-11-01 09:55:57 1
2022-11-01 10:00:00 1
2022-11-01 10:00:10 1
2022-11-01 10:00:20 1
2022-11-01 10:00:25 2
2022-11-01 10:00:30 1
2022-11-01 10:00:57 1
2022-11-01 10:20:15 1
2022-11-01 10:20:35 5
2022-11-01 10:20:55 1
2022-11-01 10:21:01 1
2022-11-01 10:21:30 1

From this table, I want to SELECT * FROM table WHERE val > 1 plus any rows with ts close to those rows, for example a) within /- 30 second time difference or b) within the same "absolute" minute (e.g. from second 00 of the minute of the timestamp until second 59.

So as a result of option a), I would like to get:

ts val comment
2022-11-01 09:55:57 1 Data within 30 s of 2022-11-01 10:00:25
2022-11-01 10:00:00 1 Data within 30 s of 2022-11-01 10:00:25
2022-11-01 10:00:10 1 Data within 30 s of 2022-11-01 10:00:25
2022-11-01 10:00:20 1 Data within 30 s of 2022-11-01 10:00:25
2022-11-01 10:00:25 2 Data within 30 s of 2022-11-01 10:00:25
2022-11-01 10:00:30 1 Data within 30 s of 2022-11-01 10:20:35
2022-11-01 10:20:15 1 Data within 30 s of 2022-11-01 10:20:35
2022-11-01 10:20:35 5 Data within 30 s of 2022-11-01 10:20:35
2022-11-01 10:20:55 1 Data within 30 s of 2022-11-01 10:20:35

or in case of option b), So as a result of option a), I would like to get:

ts val comment
2022-11-01 10:00:00 1 Same minute as 2022-11-01 10:00:25
2022-11-01 10:00:10 1 Same minute as 2022-11-01 10:00:25
2022-11-01 10:00:20 1 Same minute as 2022-11-01 10:00:25
2022-11-01 10:00:25 2 Same minute as 2022-11-01 10:00:25
2022-11-01 10:00:30 1 Same minute as 2022-11-01 10:20:35
2022-11-01 10:00:57 1 Same minute as 2022-11-01 10:20:35
2022-11-01 10:20:15 1 Same minute as 2022-11-01 10:20:35
2022-11-01 10:20:35 5 Same minute as 2022-11-01 10:20:35
2022-11-01 10:20:55 1 Same minute as 2022-11-01 10:20:35

I tried to get option a) working using a subquery and DATEDIFF, but it seems this won't work because it's not possible to use DATEDIFF and such functions on subqueries. This is the query I tried:

SELECT * FROM table t WHERE ABS(DATEDIFF(second, t.ts,  (SELECT ts FROM table WHERE val > 1))) <= 30

What's the best way to get this working only using SQL and for large tables?

CodePudding user response:

What you want is still somewhat unclear for me, but here is an example of using DATEDIFF to calculate the minute difference between timestamps (see this identical fiddle):

Create the table:

CREATE TABLE events (
  event_ts DATETIME,
  event_value INTEGER
);

Populate with data (I've added a row to better illustrate the query):

INSERT INTO events VALUES
  ('2022-11-01 10:00:00', 1),
  ('2022-11-01 10:00:10', 1),
  ('2022-11-01 10:00:20', 1),
  ('2022-11-01 10:00:25', 2),
  ('2022-11-01 10:00:30', 1),
  ('2022-11-01 10:00:55', 1),
  ('2022-11-01 10:20:15', 1),
  ('2022-11-01 10:20:35', 5),
  ('2022-11-01 10:20:55', 1),
  ('2022-11-01 10:30:55', 6)

Query:

WITH events_filtered AS (
  SELECT *
  FROM events
  WHERE event_value > 1
)

SELECT
  *,
  DATEDIFF(mi, e1.event_ts, e2.event_ts) AS event_minute_difference
FROM events_filtered AS e1
INNER JOIN events_filtered AS e2
ON e1.event_ts < e2.event_ts

I've split the two conditions into a CTE and a final query for the sake of clarity (but you could and probably should contain this in the same query). The result is:

event_ts event_value event_ts event_value event_minute_difference
2022-11-01 10:00:25.000 2 2022-11-01 10:20:35.000 5 20
2022-11-01 10:00:25.000 2 2022-11-01 10:30:55.000 6 30
2022-11-01 10:20:35.000 5 2022-11-01 10:30:55.000 6 10

I cannot comment of performance (and opportunities for improving performance) without really knowing more about the problem.

CodePudding user response:

Looking at your first requirement (a question really should focus on a single problem only), one way you can accomplish this is by using an exists semi-join, comparing each ts date with qualifying min and max dates derived in the exists criteria.

Your second (or any other) criteria can work a very similar way.

select t.*
from t
where exists (
  select * from t t2
    cross apply (values(
      case when val > 1 then DateAdd(second, -30, ts) end,
      case when val > 1 then DateAdd(second, 30, ts) end)
    )r(rmin,rmin)
  where t.ts >= rmin and t.ts <= rmin
) order by ts;

See demo Fiddle

  • Related