Home > Blockchain >  choose latest partition of a Bigquery table where filter over partition column is required
choose latest partition of a Bigquery table where filter over partition column is required

Time:12-16

I have been using the following query

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) 

It is not ideal as the partition could be unavailable due to delay.. Thus I try the following queries

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) IN 
    (
       SELECT 
         MAX(DATE(_PARTITIONTIME)) AS max_partition
       FROM `project.dataset.table`
       WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    )

as well as

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE TIMESTAMP(DATE(_PARTITIONTIME)) IN 
    (
      SELECT parse_timestamp("%Y%m%d", MAX(partition_id))
      FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
      WHERE table_name = 'table'
    )

Neither of them work due to

Cannot query over table 'project.dataset.table' without a filter over column(s) '_PARTITION_LOAD_TIME', '_PARTITIONDATE', '_PARTITIONTIME' that can be used for partition elimination.

CodePudding user response:

In both of your solutions the limiting filter for the partition column is calculated during the query. This lead to full table scan.

Therfore, you need to add a filter for the partition column which is always know at the beginning of the run of your query.

SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) IN 
    (
       SELECT 
         MAX(DATE(_PARTITIONTIME)) AS max_partition
       FROM `project.dataset.table`
       WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    )
AND DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)

If the last partition date could be months back, this is a better solution:


Declare max_date date;

execute immediate
"""
SELECT max(date(_PARTITIONTIME)) FROM `project.dataset.table`
WHERE DATE(_PARTITIONTIME) > "2000-12-15"  
""" into max_date;

execute immediate
"""
Select * from `project.dataset.table`  where date(_PARTITIONTIME)= date('"""  || max_date || "')"
  • Related