Is it possible to optimize a BigQuery query time complexity that has multiple BETWEEN
conditions on the same column?
Query example:
SELECT time, value
FROM `table`
WHERE ((time between "2019-04-06 20:40:00" AND "2019-04-07 21:00:00")
OR (time between "2020-05-04 17:12:06" AND "2020-05-04 17:37:36")
OR (time between "2019-01-16 15:10:00" AND "2019-01-16 15:45:00")
OR (time between "2020-04-09 20:55:36" AND "2020-04-09 21:01:36")
OR (time between "2019-07-08 10:05:00" AND "2019-07-08 10:45:00")
OR (time between "2021-06-07 20:02:32" AND "2021-06-07 20:30:56")
OR (time between "2019-11-03 14:55:20" AND "2019-11-03 15:17:24"))
There are more between
conditions in the real query (around 60).
The full table contains >3TB
of data
CodePudding user response:
Consider below approach
WITH time_ranges as (
select timestamp '2019-04-06 20:40:00' start_time, timestamp '2019-04-07 21:00:00' end_time union all
select '2020-05-04 17:12:06', '2020-05-04 17:37:36' union all
select '2019-01-16 15:10:00', '2019-01-16 15:45:00' union all
select '2020-04-09 20:55:36', '2020-04-09 21:01:36' union all
select '2019-07-08 10:05:00', '2019-07-08 10:45:00' union all
select '2021-06-07 20:02:32', '2021-06-07 20:30:56' union all
select '2019-11-03 14:55:20', '2019-11-03 15:17:24'
)
SELECT time, value
FROM your_table
WHERE (
SELECT LOGICAL_OR(time between start_time and end_time)
FROM time_ranges
)
CodePudding user response:
Are you able to save your criteria into a table?
If your time windows are non-overlapping, you could try a cross join and filter:
select time, value
from your_table
cross join time_windows
where time between start_time and end_time
where time_windows
would something like
start_time, end_time
'2019-04-06 20:40:00' | '2019-04-07 21:00:00'
'2020-05-04 17:12:06' | '2020-05-04 17:37:36'
etc...
with both fields already saved as timestamps.
A cross join against a 3TB table should be reasonably quick if this is the actual complexity of the query. Adding in other joins and filters could also slow it down.
Additionally, if you are able to modify your table to be partitioned on the time
column, BQ might be able to optimize your query even more.