Home > other >  Optimal way of having multiple BETWEEN condition on the same column
Optimal way of having multiple BETWEEN condition on the same column

Time:01-08

Is it possible to optimize a BigQuery query time complexity that has multiple BETWEEN conditions on the same column?

Query example:

SELECT time, value
FROM `table`
WHERE ((time between "2019-04-06 20:40:00" AND "2019-04-07 21:00:00")
    OR (time between "2020-05-04 17:12:06" AND "2020-05-04 17:37:36")
    OR (time between "2019-01-16 15:10:00" AND "2019-01-16 15:45:00")
    OR (time between "2020-04-09 20:55:36" AND "2020-04-09 21:01:36")
    OR (time between "2019-07-08 10:05:00" AND "2019-07-08 10:45:00")
    OR (time between "2021-06-07 20:02:32" AND "2021-06-07 20:30:56")
    OR (time between "2019-11-03 14:55:20" AND "2019-11-03 15:17:24"))

There are more between conditions in the real query (around 60).

The full table contains >3TB of data

CodePudding user response:

Consider below approach

WITH time_ranges as (
  select timestamp '2019-04-06 20:40:00' start_time, timestamp '2019-04-07 21:00:00' end_time union all
  select '2020-05-04 17:12:06', '2020-05-04 17:37:36' union all
  select '2019-01-16 15:10:00', '2019-01-16 15:45:00' union all
  select '2020-04-09 20:55:36', '2020-04-09 21:01:36' union all
  select '2019-07-08 10:05:00', '2019-07-08 10:45:00' union all
  select '2021-06-07 20:02:32', '2021-06-07 20:30:56' union all
  select '2019-11-03 14:55:20', '2019-11-03 15:17:24'     
)
SELECT time, value
FROM your_table
WHERE (
  SELECT LOGICAL_OR(time between start_time and end_time)
  FROM time_ranges
)

CodePudding user response:

Are you able to save your criteria into a table?

If your time windows are non-overlapping, you could try a cross join and filter:

select time, value
from your_table
cross join time_windows
where time between start_time and end_time

where time_windows would something like

start_time,            end_time
'2019-04-06 20:40:00' | '2019-04-07 21:00:00'
'2020-05-04 17:12:06' | '2020-05-04 17:37:36'
etc...

with both fields already saved as timestamps.

A cross join against a 3TB table should be reasonably quick if this is the actual complexity of the query. Adding in other joins and filters could also slow it down.

Additionally, if you are able to modify your table to be partitioned on the time column, BQ might be able to optimize your query even more.

  •  Tags:  
  • Related