I have been using the following query
SELECT DISTINCT
*
FROM
`project.dataset.table` t
WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
It is not ideal as the partition could be unavailable due to delay.. Thus I try the following queries
SELECT DISTINCT
*
FROM
`project.dataset.table` t
WHERE DATE(_PARTITIONTIME) IN
(
SELECT
MAX(DATE(_PARTITIONTIME)) AS max_partition
FROM `project.dataset.table`
WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
)
as well as
SELECT DISTINCT
*
FROM
`project.dataset.table` t
WHERE TIMESTAMP(DATE(_PARTITIONTIME)) IN
(
SELECT parse_timestamp("%Y%m%d", MAX(partition_id))
FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'table'
)
Neither of them work due to
Cannot query over table 'project.dataset.table' without a filter over column(s) '_PARTITION_LOAD_TIME', '_PARTITIONDATE', '_PARTITIONTIME' that can be used for partition elimination.
CodePudding user response:
In both of your solutions the limiting filter for the partition column is calculated during the query. This lead to full table scan.
Therfore, you need to add a filter for the partition column which is always know at the beginning of the run of your query.
SELECT DISTINCT
*
FROM
`project.dataset.table` t
WHERE DATE(_PARTITIONTIME) IN
(
SELECT
MAX(DATE(_PARTITIONTIME)) AS max_partition
FROM `project.dataset.table`
WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
)
AND DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
If the last partition date could be months back, this is a better solution:
Declare max_date date;
execute immediate
"""
SELECT max(date(_PARTITIONTIME)) FROM `project.dataset.table`
WHERE DATE(_PARTITIONTIME) > "2000-12-15"
""" into max_date;
execute immediate
"""
Select * from `project.dataset.table` where date(_PARTITIONTIME)= date('""" || max_date || "')"