I'm trying to update a table in BigQuery that is partitioned on _PARTITIONTIME and really struggling.
Source is an extract from destination that I need to backfill destination with. Destination is a large partitioned table.
To move data from source to destination, I tried this:
update t1 AS destination
set destination._PARTITIONTIME = '2022-02-09'
from t2 as source
WHERE source.id <> "1";
Because it said that the WHERE clause was required for UPDATE, but when I run it, I get a message that "update/merge must match at most one source row for each target row". I've tried... so many other methods that I can't even remember them all. INSERT INTO seemed like a no-brainer early on but it wants me to specify column names and these tables have about 800 columns each so that's less than ideal.
I would have expected this most recent attempt to work because if I do
select * from source where source.id <> "1";
I do, in fact, get results exactly the way I would expect, so that query clearly functions, but for some reason it can't load the data. This is interesting, because I created the source table by running something along the lines of:
select * from destination where DATE(createddate) = '2022-02-09' and DATE(_PARTITIONTIME) = '2022-02-10'
Is there a way to make Insert Into work for me in this instance? If there is not, does someone have an alternate approach they recommend?
CodePudding user response:
You can use the bq
command line tool (usually comes with the gcloud
command line utility) to run a query that will overwrite a partition in a target table with your query results:
bq query --allow_large_results --replace --noflatten_results --destination_table 'target_db.target_table$20220209' "select field1, field2, field3 from source_db.source_table where _PARTITIONTIME = '2022-02-09'";
- Note the
$YYYYMMMDD
postfix with thetarget_table
. This indicates that the partition corresponding toYYYYMMDD
is to be overwritten by the query results. - Make sure to distinctively select fields in your query (as a good practice) to avoid unexpected surprises. For instance,
select field1, field2, field3 from table
is way more explicit and readable thanselect * from table
.
CodePudding user response:
I like that and will attempt it in the morning. I've played around with the bq cli but didn't encounter that option. One question - can one use this command without specifying column names? I ask because there are about 800 and some of the options I have encountered (looking at you, Insert Into) do seem to want the names of all of them.