I recently added a new column in my BigQuery Table. The following code snippet is used in legacy code to determine the table schema
df = gbq.read_gbq('SELECT * FROM {}.{} where 1=0'.format(BIGQUERY_DATASET_NAME, table), project_id=project_id)
But the problem is that it is not returning the newly added column in the df
. Although when I use some other condition like 1=3
in where clause or limit 0
then it returns the correct schema.
Trying to understand what is causing the issue.
CodePudding user response:
If you want to get the column names - which I assume is the point of this - and can change the legacy code, perhaps a better approach would be to get it directly from the INFORMATION_SCHEMA view.
An example would be as follows:
schema_query = f"""
SELECT column_name
FROM {BIGQUERY_DATASET_NAME}.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = '{table}'
"""
df = gbq.read_gbq(schema_query, project_id=project_id)
(if using python <3.6, revert to the .format
syntax of course)
I have a hunch that this will avoid the problem you're encountering using the legacy code you're working with. I agree with the other answer that it's quite possible you're seeing cached results.
CodePudding user response:
I'm guessing, it has to do with caching , you can always ask BQ not to use caching :
df = gbq.read_gbq('SELECT * FROM {}.{} where 1=0'.format(...)
, configuration = {'query': {'useQueryCache': False}} )