I'm working on a small python script to write data I get from an API into a BigQuery table. I only have a small amount of data. Data from the API are updated on daily basis. I want to be able to overwrite the table destination with the most recent data everyday.
I have the following script in a cloud function:
data = {'col': 'value'} # the JSON response I get from the API (simplified output)
job_config = bigquery.LoadJobConfig(
schema = [
bigquery.SchemaField("col", "STRING"),
],
write_disposition="WRITE_TRUNCATE",
)
job = bq_client.insert_rows_json(table_id, [data], job_config=job_config)
and the following error Client.insert_rows_json() got an unexpected keyword argument 'job_config'
Should I use a different method than insert_rows_json()
? What's the best way to write these new data everyday in the same table?
CodePudding user response:
I think it's the best way, but you can't pass JobConfig
with WRITE_TRUNCATE
mode. You have to execute a separated query to truncate
the table, then append all your dicts to BigQuery
with insert_rows_json
:
data = {'col': 'value'} # the JSON response I get from the API (simplified output)
# Truncate the table
bq_client = bigquery.Client()
query_job = bq_client.query(f"TRUNCATE table {table_id}")
results = query_job.result()
# Append your Dict from api to BQ
job = bq_client.insert_rows_json(table_id, [data])
The insert_rows_json
uses the following insertAll api.