Home > Software engineering >  append dates to table and overwrite it once data is available BigQuery Standard SQL
append dates to table and overwrite it once data is available BigQuery Standard SQL

Time:02-19

I have a my_table in BigQuery that is being populated on the first of each month. The data looks like this:

date          rate    
01/01/22      1.5
01/02/22      1.4
...
01/31/22      1.7

On March 1st, 2022 I will have February data (day by day). But while the data for February is not available I need to append rows every day with current_date and null for rate column and then overwrite it once the data is available on March 1st. Desired output is:

date          rate    
01/01/22      1.5
01/02/22      1.4
...
01/31/22      1.7
02/01/22      null
02/02/22      null
...
02/17/22      null

CodePudding user response:

You can use the following logic to automate your process:

  1. On the first of every month, check for the arrival of a new file. If its there, add its content to a separate table (call it monthly_table).

  2. On a daily basis, run a query which combines the output from monthly_table and my_table and overwrites my_table with the merged output.

For (#1), you can run a command which loads data from your file (assuming its a CSV) into monthly_table. This can be done with bq command line utility, with --noreplace flag to ensure that the new data is always appended:

bq load \
    --source_format=CSV \
    --skip_leading_rows=2 \
    --noreplace \
    mydataset.monthly_table \
    gs://mybucket/data_for_2022_02_01.csv \
    '[{"name": "date", "type": "STRING"}, {"name": "rate", "type": "FLOAT64"}]'

For (#2), you can run a query on daily basis which merges both tables, and then overwrite the my_table with the output:

select date, max(rate) as rate from (
   select date, rate from mydataset.monthly_table
      union all
   select date, rate from mydataset.my_table
      union all
   select format_date('%m/%d/%Y', current_date()) as date, null as rate 
)
group by date
order by date

You can also use bq utility to run the second query (with --replace=true) flag.

  • Related