Home > Software design >  Using Update statement with the _PARTITIONDATE Pseudo-column
Using Update statement with the _PARTITIONDATE Pseudo-column

Time:02-17

I'm trying to update a table in BigQuery that is partitioned on _PARTITIONTIME and really struggling.

Source is an extract from destination that I need to backfill destination with. Destination is a large partitioned table.

To move data from source to destination, I tried this:

update t1 AS destination
set destination._PARTITIONTIME = '2022-02-09'
from t2 as source
WHERE source.id <> "1";

Because it said that the WHERE clause was required for UPDATE, but when I run it, I get a message that "update/merge must match at most one source row for each target row". I've tried... so many other methods that I can't even remember them all. INSERT INTO seemed like a no-brainer early on but it wants me to specify column names and these tables have about 800 columns each so that's less than ideal.

I would have expected this most recent attempt to work because if I do

select * from source where source.id <> "1";

I do, in fact, get results exactly the way I would expect, so that query clearly functions, but for some reason it can't load the data. This is interesting, because I created the source table by running something along the lines of:

select * from destination where DATE(createddate) = '2022-02-09' and DATE(_PARTITIONTIME) = '2022-02-10'

Is there a way to make Insert Into work for me in this instance? If there is not, does someone have an alternate approach they recommend?

CodePudding user response:

You can use the bq command line tool (usually comes with the gcloud command line utility) to run a query that will overwrite a partition in a target table with your query results:

bq query --allow_large_results --replace --noflatten_results --destination_table 'target_db.target_table$20220209' "select field1, field2, field3 from source_db.source_table where _PARTITIONTIME = '2022-02-09'";
  • Note the $YYYYMMMDD postfix with the target_table. This indicates that the partition corresponding to YYYYMMDD is to be overwritten by the query results.
  • Make sure to distinctively select fields in your query (as a good practice) to avoid unexpected surprises. For instance, select field1, field2, field3 from table is way more explicit and readable than select * from table.

CodePudding user response:

I like that and will attempt it in the morning. I've played around with the bq cli but didn't encounter that option. One question - can one use this command without specifying column names? I ask because there are about 800 and some of the options I have encountered (looking at you, Insert Into) do seem to want the names of all of them.

  • Related